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TITLE OF THE INVENTION 

5 SINGLE NUCLEOTIDE POLYMORPHISMS 

AND THEIR USE IN GENETIC ANALYSIS 



FIELD OF THE INVENTION 

10 The present invention is in the field of recombinant DNA 

technology. More specifically, the invention is directed to 
molecules and methods suitable for identifying single nucleotide 
polymorphisms in the genome of an animal, especially a horse or a 
human, and using such sites to analyze identity, ancestry or genetic 

15 traits. 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is a continuation-in-part of U.S. Patent 
Application Serial No. 08/145,145 (filed November 3, 1993). 

2 0 BACKGROUND OF THE INVENTION 

The capacity to genotype an animal, plant or microbe is of 
fundamental importance to forensic science, medicine and 
epidemiology and public health, and to the breeding and exhibition 

25 of animals. Such a capacity is needed, for example, to determine 
the identity of the causative agent of an infectious disease, to 
determine whether two individuals are related, or to establish 
whether a particular animal such as a horse is a thoroughbred. 

The analysis of identity and parentage, along with the 

30 capacity to diagnose disease is also of central concern to human, 
animal and plant genetic studies, particularly forensic or paternity 
evaluations, and in the evaluation of an individual's risk of genetic 
disease. Such goals have been pursued by analyzing variations in 



DNA sequences that distinguish the DNA of one individual from 
another. 

If such a variation alters the lengths of the fragments that 
are generated by restriction endonuciease cleavage, the variations 
5 are referred to as restriction fragment length polymorphisms 
("RFLPs"). RFLPs have been widely used in human and animal 
genetic analyses (Glassberg, J., UK patent Application 2135774; 
Skolnick, M.H. et aL Cvtoaen. Cell Genet. 32:58-67 (1982); Botstein, 
D. et aL Ann. J. Hum. Genet. 32:314-331 (1980); Fischer, S.G et al. 

10 (PCT Application WO90/13668); Uhlen, M., PCT Application 
WO90/11369)). Where a heritable trait can be linked to a 
particular RFLP, the presence of the RFLP in a target animal can be 
used to predict the likelihood that the animal will also exhibit the 
trait. Statistical methods have been developed to permit the 

1 5 multilocus analysis of RFLPs such that complex traits that are 
dependent upon multiple alleles can be mapped (Lander, S. et al. . 
Proc. Natl. Acad. Sci. (U.S.A.1 83:7353-7357 (1986); Lander, S. et al. . 
Proc. Natl. Acad. Sci. (U.S.A.) 84:2363-2367 (1987); Donis-Keller, H. 
et al. . Cell 51:319-337 (1987); Lander, S. et al. . Genetics 121 :185- 

2 0 199 (1989), ail herein incorporated by reference). Such methods 
can be used to develop a genetic map, as well as to develop plants 
or animals having more desirable traits (Donis-Keller, H. et al.. Cell 
5J_:319-337 (1987); Lander, S. et al.. Genetics 121 :1 85-1 99 
(1989)). 

2 5 In some cases, the DNA sequence variations are in regions of 

the genome that are characterized by short tandem repeats (STRs) 
that include tandem di- or tri-nucieotide repeated motifs of 
nucleotides. These tandem repeats are also referred to as "variable 
number tandem repeat" ("VNTR") polymorphisms. VNTRs have been 

30 used in identity and paternity analysis (Weber, J.L., U.S. Patent 
5,075,217; Armour, J.A.L et al. . FEBS Lett. 307 :113-115 (1992); 
Jones, L et al.. Eur. J. Haematol. 39 :144-147 (1987); Horn, G.T. et 
aL, PCT Application WO91/14003; Jeffreys, A.J., European Patent 
Application 370,719; Jeffreys, A.J., U.S. Patent 5,175,082); 

35 Jeffreys. A.J. et al.. Amer. J. Hum. Genet. 39:11-24 (1986); Jeffreys. 
A.J. et aL . Nature 316 :76-79 (1985); Gray, l.C. et ah . Proc. R. Acad. 
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Soc. Lond. 243:241-253 (1991); Moore, S.S. et al.. Genomics lO:fifi/t- 
660 (1991); Jeffreys, A.J. et al. . Anim. Genet. 1 8 :1-15 (1987); 
Hillel, J. et al. . Anim. Genet. 20:145-155 (1989); Hillel, J. et al.. 
Genet. 124 :783-789 (1990)) and are now being used in a large 
5 number of genetic mapping studies. 

A third class of DNA sequence variation results from single 
nucleotide polymorphisms (SNPs) that exist between individuals of 
the same species. Such polymorphisms are far more frequent than 
RFLPs, STRs and VNTRs. In some cases, such polymorphisms 

1 0 comprise mutations that are the determinative characteristic in a 
genetic disease. Indeed, such mutations may affect a singie 
nucleotide in a protein-encoding gene in a manner sufficient to 
actually cause the disease (i.e. hemophilia, sickle-cell anemia, 
etc.). In many cases, these SNPs are in noncoding regions of a 

1 5 genome. Despite the central importance of such polymorphisms in 
modern genetics, no practical method has been developed that 
permits the use of highly parallel analysis of many SNP alleles in 
two or more individuals in genetic analysis. 

The present invention provides such an improved method. 

20 Indeed, the present invention provides methods and gene sequences 
that permit the genetic analysis of identity and parentage, and the 
diagnosis of disease by discerning the variation of single 
nucleotide polymorphisms. 

25 SUMMARY OF THE INVENTION 

The present invention is directed to molecules that comprise 
single nucleotide polymorphisms (SNPs) that are present in 
mammalian DNA, and in particular, to equine and human genomic 

30 DNA polymorphisms. The invention is directed to methods for (i) 
identifying novel single nucleotide polymorphisms (ii) methods for 
the repeated analysis and testing of these SNPs in different 
samples and (iii) methods for exploiting the existence of such sites 
in the genetic analysis of single animals and populations of 

35 animals. 
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The analysis (genotyping) of such sites is useful in 
determining identity, ancestry, predisposition to genetic disease, 
the presence or absence of a desired trait, etc. In detail, the 
invention provides a nucleic acid primer molecule having a 
5 polynucleotide sequence complementary to an "invariant" 
nucleotide sequence of a genomic DNA segment of a mammal, the 
genomic segment being located immediately 3'-distal to a single 
nucleotide polymorphic site, X, of a single nucleotide polymorphic 
allele of the mammal; and wherein tempiate-dependent extension 

1 0 of the nucleic acid primer molecule by a single nucleotide extends 
the primer molecuie by a single nucleotide, the single nucleotide 
being complementary to the nucleotide, X, of the single nucleotide 
polymorphic allele. The invention particularly concerns the 
embodiment wherein the mammal is selected from the group 

1 5 consisting of humans, non-human primates, dogs, cats, cattle, 
sheep, and horses. 

The invention particularly concerns the embodiments wherein 
the mammal is a horse, and wherein the nucleic acid molecule has a 
nucleotide sequence selected from the group consisting of SEQ ID 

20 NO:(2n+1) [refer to Table 1], wherein n is an integer selected from 
the group consisting of 0 through 35, or wherein the sequence of 
the immediately 3'-distal segment includes a sequence selected 
from the group consisting of SEQ ID NO:(2n+2), wherein n is an 
integer selected from the group consisting of 0 through 35. 

25 The invention also provides a nucleic acid molecule having a 

sequence complementary to a sequence selected from the group 
consisting of SEQ ID NO:1 through SEQ ID NO:72. The invention also 
provides a set of at least two of such nucleic acid molecules. 

The invention also provides a set of at least two nucleic acid 

30 molecules, wherein at least one of the nucleic acid molecules has a 
sequence complementary to a sequence selected from the group 
consisting of SEQ ID NO:1 through SEQ ID NO:72. 

The invention also provides a method for determining the 
extent of genetic similarity between DNA of a target horse and DNA 

35 of a reference horse, which comprises the steps: 
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A) determining, for a single nucleotide polymorphism of the 
target horse, and for a corresponding single nucleotide 
polymorphism of the reference horse, whether the 
polymorphisms contain the same single nucleotide at their 

5 respective polymorphic sites; and 

B) using the comparison to determine the extent of genetic 
similarity between the target horse and the reference 
horse. 

The invention also concerns the embodiment of such method 
10 wherein the polymorphic sites are flanked by (1) an immediately 
5'-proximal sequence selected from the group consisting of SEQ ID 
NO:(2n+1), and (2) an immediately 3'-distal sequence selected from 
the group consisting of SEQ ID NO:(2n+2); wherein n is an integer 
selected from the group consisting of 0 through 35. 
1 5 The invention particularly concerns the embodiment wherein, 

in step A, the determination is accomplished by a method having 
the sub-steps: 

(a) incubating a sample of nucleic acid containing the single 
nucleotide polymorphism of the target horse, or the single 

20 nucleotide polymorphism of the reference horse, in the 

presence of a nucleic acid primer and at least one 
dideoxynucleotide derivative, under conditions sufficient 
to permit a polymerase mediated, template-dependent 
extension of the primer, the extension causing the 

25 incorporation of a single dideoxynucleotide to the 3'- 

terminus of the primer, the single dideoxynucleotide being 
complementary to the single nucleotide of the 
polymorphic site of the polymorphism; 

(b) permitting the template-dependent extension of the 
30 primer molecule, and the incorporation of the single 

.dideoxynucleotide; and 

(c) determining the identity of the nucleotide incorporated 
into the polymorphic site, the identified nucleotide being 
complimentary to the nucleotide of the polymorphic site. 

35 The invention further concerns the embodiment of the above 

methods wherein the template-dependent extension of the primer 
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is conducted in the presence of at least two dideoxynucleotide 
triphosphate derivatives selected from the group consisting of 
ddATP, ddTTP, ddCTP and ddGTP, but in the absence of dATP, dTTP, 
dCTP and dGTP. 

5 The invention particularly concerns the sub-embodiments of 

the above methods wherein the nucleic acid of the sample is 
amplified in vitro prior to the incubation, and/or the primer is 
immobilized to a solid support. 

The invention further concerns the embodiment of the above 
1 0 methods wherein a non-invasive swab is used to coilect the sample 
of DNA. 

The invention further provides a method for determining the 
probability that a target horse will have a particular trait, which 
O comprises the steps: 

J 15 A) determining the identity of a single nucleotide present at 
jF a poiymorphic site of an equine single nucleotide 

m polymorphism, and being present in more than 51% of a 

P set of reference horses; 

I s " B) determining whether a single nucleotide present at a 

O 20 polymorphic site of a corresponding single nucleotide 

polymorphism of the target horse has the same identity as 
the single nucleotide present at the polymorphic site of 
the 51% of reference horses exhibiting the trait; 
C) using the determination of step B to establish the 
25 probability that the target horse will have the particular 

trait. 

The invention further provides a method for creating a 
genetic map of unique sequence equine polymorphisms which 
comprises the steps: 
30 A) identifying at least one pair of inter-breeding reference 
horses, wherein each of the pairs of horses is characterized 
by having a first and a second reference horse, 
the first reference horse having: 

two alleles (i) and (ii), the alleles each being single 
35 nucleotide polymorphic alleles having a single 

nucleotide polymorphic site; 
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the second reference horse having: 

a corresponding allele (i') to the allele (i) of the 
first reference horse, wherein the allele (i*) has a 
single nucleotide polymorphic site, and wherein the 
5 single nucleotide present at the polymorphic site of 

the allele (P) differs from the single nucleotide 
present at the polymorphic site of the allele (i) of 
the first reference horse, and 
B) identifying in a progeny of at least one of the pairs of inter- 
1 0 breeding reference horses the single nucleotide present at a 

single nucleotide polymorphic site of a corresponding allele 
of the alleles (i) and (i 1 ), and the single nucleotide present at 
a single nucleotide polymorphic site of a corresponding 
alieie of the alleles (ii) and (ii 1 ); and 
1 5 C) determining the extent of genetic linkage between the 
alleles (i) and (ii), to thereby create the genetic map. 
The invention further provides a method for predicting 
whether a target horse will exhibit a predetermined trait which 
comprises the steps: 
20 A) identifying one or more alleles associated with the trait, 
each allele being a single nucleotide polymorphic allele 
having a single nucleotide polymorphic site; 

B) determining for each of the single nucleotide polymorphic 
alleles, a nucleotide present at the allele's polymorphic 

25 site in a reference horse exhibiting the trait, to thereby 

define a set of single nucleotides at a set of polymorphic 
sites that are present in a reference horse exhibiting the 
trait; 

C) determining the identity of single nucleotides present at 
30 corresponding single nucleotide polymorphic alleles of the 

target horse; and 

D) comparing the identity of the single nucleotides present at 
the polymorphic sites of the polymorphisms of the 
reference animal with the single nucleotides present at the 

35 corresponding single nucleotide polymorphic alleles of the 

target horse. 



The invention further provides a method for identifying a 
ngie nucleotide polymorphic site which comprises: 

A) isolating a fragment of genomic DNA of a reference 
organism; 

B) sequencing the fragment of DNA to thereby determine the 
nucleotide sequence of a segment of the fragment, the 
segment being of a length sufficient to define the 
nucleotide sequence of a pair of oligonucleotide primers 
capable of mediating the specific amplification of the 
fragment; 

C) using the oligonucleotide primers to mediate the specific 
amplification of DNA obtained from a plurality of other 
organisms of the same species as the reference organism; 
and 

D) determining the nucleotide sequences of the amplified DNA 
molecules of step C, and comparing the sequence of the 
amplified molecules with the sequence of the fragment of 
the reference organism to thereby identify a single 
nucleotide polymorphic site. 

The invention also includes a method for interrogating a 
polymorphic region of a human single nucleotide 
polymorphism of a target human, the method comprising: 

A) selecting a known human single nucleotide polymorphism 
for interrogation; 

B) identifying the sequence of at least one oligonucleotide 
that flanks the selected single nucleotide polymorphism; 
the identified sequence being of a length sufficient to 
permit the identification of primers capable of being used 
to effect the specific amplification of the flanking 
oligonucleotide and the polymorphism; 

C) using the primers to effect the amplification of the 
flanking oligonucleotide and the polymorphism of the 
single nucleotide polymorphism of the target human; and 

D) interrogating the single nucleotide polymorphism of the 
amplified polymorphism by genetic bit analysis. 



- 9 - 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 illustrates the preferred method for cloning random 
genomic fragments. Genomic DNA us size fractionated, and then 
5 introduced into a plasmid vector, in order to obtain random clones. 
PCR primers are designed, and used to sequence the inserted 
genomic sequences. 

Figure 2 illustrates the data generated by preferred method 
for identifying new polymorphic sequences which is cycle 

1 0 sequencing of a random genomic fragment. 

Figure 3 illustrates the RFLP method for screening random 
clones for polymorphic sequences. After the initial optimization of 
PCR conditions (top panel), amplified material is cleaved with 
several restriction enzymes, and the resulting profiles are 

1 5 analyzed (middle panels). A population study is then performed to 
determine allelic frequencies. 

Figure 4 shows a graph of the probability that two 
individuals will have identical genotypes with given panels of 
genetic markers. The number of tests employed is plotted on the 

20 abscissa while the cumulative probability of non-identity is 
plotted on the ordinate. The horizontal line indicates 0.95 
probability of non-identity. Legend: o indicates the extrapolated 
prototype; x indicates 3 alleles (51%, 34%, 15%); triangle indicates 
2 alleles (79%, 21%). 

25 Figure 5 shows a graph of the probability that given panels 

of 20 genetic markers will exclude a random alleged father in a 
paternity suit in which the mother is not in question. The number 
of tests employed is plotted on the abscissa while the cumulative 
probability of exclusion is plotted on the ordinate. The horizontal 

30 line indicates 0.95 probability of exclusion. The legend is as in 
Figure 4. 

Figure 6 uses the SNP identified in clone 177-2 to illustrate 
the organization of the sequences in Table 1. 

Figure 7 illustrates the preferred method for genotyping 
35 SNPs. The seven steps illustrate how GBA can be performed 
starting with a biological sample. 
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Figures 8A and 8B illustrate how horse parentage data 
appears at the microtiter plate level. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

5 I. The Single Nucleotide Polymorphisms of the Present 
Invention and The Advantages of their Use in Genetic 
Analysis 

A. The Attributes of the Polymorphisms 

The particular gene sequences of interest to the present 

1 0 invention comprise "single nucleotide polymorphisms." A 
"polymorphism" is a variation in the DNA sequence of some 
members of a species. The genomes of animals and plants naturally 
undergo spontaneous mutation in the course of their continuing 
evolution (Guseila, J.F., Ann. Rev. Biochem. 55:831-854 (1986)). 

1 5 The majority of such mutations create polymorphisms. The 
mutated sequence and the initial sequence co-exist in the species' 
population. In some instances, such co-existence is in stable or 
quasi-stable equilibrium. In other instances, the mutation confers 
a survival or evolutionary advantage to the species, and 

20 accordingly, it may eventually (i.e. over evolutionary time) be 
incorporated into the DNA of every member of that species. 

A polymorphism is thus said to be "allelic," in that, due to the 
existence of the polymorphism, some members of a species may 
have the unmutated sequence (i.e. the original "allele") whereas 

25 other members may have a mutated sequence (i.e. the variant or 
mutant "allele"). In the simplest case, only one mutated sequence 
may exist, and the polymorphism is said to be dialleiic. Diallelic 
polymorphisms are the most common and the preferred 
polymorphisms of the present invention. The occurrence of 

30 alternative mutations can give rise to trialleleic, etc. 
polymorphisms. An allele may be referred to by the nucleotide(s) 
that comprise the mutation. Thus, for example, in Table 1, clone 
177-2 (SEQ ID NO:1 and SEQ ID NO:2) illustrates the sequence of one 
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strand of a dialleiic polymorphism in which one allele has a "C" and 
the other allele has a "T* at the polymorphic site. 

The present invention is directed to a particular class of 
allelic polymorphisms, and to their use in genotyping a plant or 
5 animal. Such allelic polymorphisms are referred to herein as 
"single nucleotide polymorphisms," or "SNPs." "Single nucleotide 
polymorphisms" are defined by the following attributes. A central 
attribute of such a polymorphism is that it contains a polymorphic 
site, "X," most preferably occupied by a single nucleotide, which is 

1 0 the site of variation between allelic sequences. A second 
characteristic of an SNP is that its polymorphic site "X" is 
preferably preceded by and followed by "invariant" sequences of the 
allele. The polymorphic site of the SNP is thus said to lie 
"immediately" 3' to a "5'-proximar invariant sequence, and 

1 5 "immediately" 5' to a "3'-distal" invariant sequence. Such 
sequences flank the polymorphic site. 

As used herein, a sequence is said to be an "invariant" 
sequence of an allele if the sequence does not vary in the 
population of the species, and if mapped, would map to a 

20 "corresponding" sequence of the same allele in the genome of every 
member of the species population. Two sequences are said to be 
"corresponding" sequences if they are analogs of one another 
obtained from different sources. The gene sequences that encode 
hemoglobin in two humans illustrate "corresponding" allelic 

25 sequences. The definition of "corresponding alleles" provided 
herein is intended to clarify, but not to alter, the meaning of that 
term as understood by those of ordinary skill in the art. Each row 
of Table 1 shows the identity of the nucleotide of the polymorphic 
site of "corresponding" equine alleles, as well as the invariant 5'- 

30 proximal and 3'-distal sequences that are also attributes of that 
SNP. "Correspondiong alleles" are illustrated in Table 5 with 
regard to human alleles. Each row of Table 5 shows the identity of 
the nucleotide of the polymorphic site of "corresponding" human 
alleles, as well as the invariant 5'-proximal and 3'-distal 

35 sequences that are also attributes of that SNP. 
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Since genomic DNA is double-stranded, each SNP can be 
defined in terms of either strand. Thus, for every SNP, one strand 
will contain an immediately 5'-proxima! invariant sequence and the 
other will contain an immediately 3'-distal invariant sequence. In 
5 the preferred embodiment, wherein a SNP's polymorphic site, "X," is 
a single nucleotide, each strand of the double-stranded DNA of the 
SNP will contain both an immediately 5'-proximal invariant 
sequence and an immediately 3'-distal invariant sequence. 

Although the preferred SNPs of the present invention involve 

1 0 a substitution of one nucleotide for another at the SNP's 
polymorphic site, SNPs can also be more complex, and may 
comprise a deletion of a nucleotide from, or an insertion of a 
nucleotide into, one of two corresponding sequences. For example, 
a particular gene sequence may contain an A in a particular 

1 5 polymorphic site in some animals, whereas in other animals a 
single or multiple base deletion might be present at that site. 
Although the preferred SNPs of the present invention have both an 
invariant proximal sequence and invariant distal sequence, SNPs 
may have only an invariant proximal or only an invariant distal 

20 sequence. 

Nucleic acid molecules having the a sequence complementary 
to that of an immediately 3'-distai invariant sequence of a SNP 
can, if extended in a "template-dependent" manner, form an 
extension product that would contain the SNP's polymorphic site. 

2 5 An preferred example of such a nucleic acid molecule is a nucleic 
acid molecule whose sequence is the same as that of a 5'-proximal 
invariant sequence of the SNP. "Template-dependent" extension 
refers to the capacity of a polymerase to mediate the extension of 
a primer such that the extended sequence is complementary to the 

30 sequence of a nucleic acid template. A "primer" is a single- 
stranded oligonucleotide or a single-stranded polynucleotide that 
is capable of being extended by the covalent addition of a 
nucleotide in a "template-dependent" extension reaction. In order 
to possess such a capability, the primer must have a 3'-hydroxyi 

35 terminus, and be hybridized to a second nucleic acid molecule (i.e. 
the "template"). A primer is typically 11 bases or longer; most 
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preferably, a primer is 20 bases, however, primers of shorter or 
greater iength may suffice. A "polymerase" is an enzyme that is 
capable of incorporating nucleoside triphosphates to extend a 3'- 
hydroxyl group of a nucleic acid molecule, if that molecule has 
5 hybridized to a suitable template nucleic acid molecule. 
Polymerase enzymes are discussed in Watson, J.D., In: Molecular 
Biology of the Gene . 3rd Ed., W.A. Benjamin, Inc., Menlo Park, CA 
(1977), which reference is incorporated herein by reference, and 
similar texts. Other polymerases such as the large proteolytic 
10 fragment of the DNA polymerase I of the bacterium E. coli. 
commonly known as "Klenow" polymerase, E. coli DNA polymerase I, 
and bacteriophage T7 DNA polymerase, may also be used to perform 
the method described herein. Nucleic acids having the same 
sequence as that of the immediately 3' distal invariant sequence of 

1 5 a SNP can be ligated in a template dependent fashion to a primer 

that has the same sequence as that of the immediately 5' proximal 
sequence that has been extended by one nucleotide in a template 
dependent fashion. 

20 B. The Advantages of Using SNPs in Genetic 

Analysis 

The single nucleotide polymorphic sites of the present 
invention can be used to analyze the DNA of any plant or animal. 

2 5 Such sites are particularly suitable for analyzing the genome of 

mammals, including humans, non-human primates, domestic 
animals (such as dogs, cats, etc.), farm animals (such as cattle, 
sheep, etc.) and other economically important animals, in 
particular, horses. They may, however be used with regard to other 
30 types of animals, particularly birds (such as chickens, turkeys, 
etc.) SNPs have several salient advantages over RFLPs, STRs and 
VNTRs. 

First, SNPs occur at greater frequency (approximately 10- 
100 fold greater), and with greater uniformity than RFLPs and 

3 5 VNTRs. The greater frequency of SNPs means that they can be more 

readily identified than the other classes of polymorphisms. The 
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greater uniformity of their distribution permits the identification 
of SNPs "nearer" to a particular trait of interest. The combined 
effect of these two attributes makes SNPs extremely valuable. For 
example, if a particular trait (e.g. predisposition to cancer) 
5 reflects a mutation at a particular locus, then any polymorphism 
that is linked to the particular locus can be used to predict the 
probability that an individual will be exhibiting that trait. 

The value of such a prediction is determined in part by the 
distance between the polymorphism and the locus. Thus, if the 

1 0 locus is located far from any repeated tandem nucleotide sequence 

motifs, VNTR analysis will be of very limited value. Similarly, if 
the locus is far from any detectable RFLP, an RFLP analysis would 
not be accurate. However, since the SNPs of the present invention 
are present approximately once every 300 bases in the mammalian 
15 genome, and exhibit uniformity of distribution, a SNP can, 
statistically, be found within 150 bases of any particular genetic 
lesion or mutation. Indeed, the particular mutation may itself be 
an SNP. Thus, where such locus has been sequenced, the variation 
in that locus' nucleotide is determinative of the trait in question. 

2 0 Second, SNPs are more stable than other classes of 

polymorphisms. Their spontaneous mutation rate is approximately 
10-9, approximately 1,000 times less frequent than VNTRs. 
Significantly, VNTR-type polymorphisms are characterized by high 
mutation rates. 

2 5 Third, SNPs have the further advantage that their allelic 

frequency can be inferred from the study of relatively few 
representative samples. These attributes of SNPs permit a much 
higher degree of genetic resolution of identity, paternity exclusion, 
and analysis of an animal's predisposition for a particular genetic 

30 trait than is possible with either RFLP or VNTR polymorphisms. 

Fourth, SNPs reflect the highest possible definition of 
genetic information -- nucleotide position and base identity. 
Despite providing such a high degree of definition, SNPs can be 
detected more readily than either RFLPs or VNTRs, and with greater 

35 flexibility. Indeed, because DNA is double-stranded, the 
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complimentary strand of the allele can be analyzed to confirm the 
presence and identity of any SNP. 

The flexibility with which an identified SNP can be 
characterized is a salient feature of SNPs. VNTR-type 
5 polymorphisms, for example, are most easily detected through size 
fractionation methods that can discern a variation in the number of 
the repeats. RFLPs are most easily detected by size fractionation 
methods following restriction digestion. 

In contrast, SNPs can be characterized using any of a variety 
1 0 of methods. Such methods include the direct or indirect sequencing 
of the site, the use of restriction enzymes where the respective 
alleles of the site create or destroy a restriction site, the use of 
ailele-specific hybridization probes, the use of antibodies that are 
specific for the proteins encoded by the different alleles of the 

1 5 polymorphism, or by other biochemical interpretation. 

The "Genetic Bit Analysis ("GBA") method disclosed by Goelet 
P. et al . (WO 92/15712, herein incorporated by reference), and 
discussed below, is a preferred method for detecting the single 
nucleotide polymorphisms of the present invention. GBA is a 

2 0 method of polymorphic site interrogation in which the nucleotide 

sequence information surrounding the site of variation in a target 
DNA sequence is used to design an oligonucleotide primer that is 
complementary to the region immediately adjacent to, but not 
including, the variable nucleotide in the target DNA. The target 

2 5 DNA template is selected from the biological sample and hybridized 
to the interrogating primer. This primer is extended by a single 
labeled dideoxynucleotide using DNA polymerase in the presence of 
two, and preferably all four chain terminating nucleoside 
triphosphate precursors. Cohen, D. et ai. (PCT Application 

30 WO91/02087) describes a related method of genotyping. 

Recently, several primer-guided nucleotide incorporation 
procedures for assaying polymorphic sites in DNA have been 
described (Komher, J. S. et al .. Nucl. Acids. Res. 17:7779-7784 
(1989); Sokolov, B. P., Nucl. Acids Res. 18:3671 (1990); Syvanen, A.- 

35 C. et al .. Genomics 8:684 - 692 (1990); Kuppuswamy, M.N. et al .. 
Proc. Natl. Acad. Sci. (U.S.A. 1 ! 88:1143-1147 (1991); Prezant, T.R. et 
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aL Hum. Mutat. 1:159-164 (1992); Ugozzoli, L et aL GATA 9:107- 
112 (1992); Nyren, P. et al .. Anal. Biochem. 208 :171-175 (1993)). 
These methods differ from GBA in that they all rely on the 
incorporation of labeled deoxynucleotides to discriminate between 
5 bases at a polymorphic site. In such a format, since the signal is 
proportional to the number of deoxynucleotides incorporated, 
polymorphisms that occur in runs of the same nucleotide can result 
in signals that are proportional to the length of the run (Syvanen, 
A.-C, et al .. Amer. J. Hum. Genet. 52:46-59 (1993)). Such a range of 

1 0 locus-specific signals could be more complex to interpret, 
especially for heterozygotes, compared to the simple, ternary (2:0, 
1:1, or 0:2) class of signals produced by the GBA method. In 
addition, for some loci, incorporation of an incorrect 
deoxynucleotide can occur even in the presence of the correct 

15 dideoxynucieotide (Komher, J. S. et al .. Nucl. Acids. Res. 17 :7779- 
7784 (1989)). Such deoxynucleotide misincorporation events may 
be due to the Km of the DNA polymerase for the mispaired deoxy- 
substrate being comparable, in some sequence contexts, to the 
relatively poor Km of even a correctly base paired dideoxy- 

20 substrate (Kornberg, A., et al .. In: DNA Replication, 2nd Edition, W.H. 
Freeman and Co., (1992); New York; Tabor, S. et al .. Proc. Natl. Acad. 
Sci. (U.S.A.) 86:4076-4080 (1989)). This effect would contribute to 
the background noise in the polymorphic site interrogation. 

25 II. Methods for Discovering Novel Polymorphic Sites 

A preferred method for discovering polymorphic sites 
involves comparative sequencing of genomic DNA fragments from a 
number of haploid genomes. In the preferred embodiment, 

30 illustrated in Figure 1, such sequencing is performed by preparing a 
random genomic library that contains 0.5-3 kb fragments of DNA 
derived from one member of a species. Sequences of these 
recombinants are then used to facilitate PCR sequencing of a 
number of randomly selected individuals of that species at the 

3 5 same genomic loci. 
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From such genomic libraries (typically of approximately 
50,000 clones), several hundred (200-500) individual clones are 
purified, and the sequences of the termini of their inserts are 
determined. Only a smail amount of terminal sequence data (100- 
5 200 bases) need be obtained to permit PCR amplification of the 
cloned region. The purpose of the sequencing is to obtain enough 
sequence information to permit the synthesis of primers suitable 
for mediating the amplification of the equivalent fragments from 
genomic DNA samples of other members of the species. Preferably, 

1 0 such sequence determinations are performed using cycle 
sequencing methodology. 

The primers are used to amplify DNA from a panel of 
randomly selected members of the target species. The number of 
members in the panel determines the lowest frequency of the 

1 5 polymorphisms that are to be isolated. Thus, if six members are 
evaluated, a polymorphism that exists at a frequency of, for 
example, 0.01 might not be identified. In an illustrative, but 
oversimplified, mathematical treatment, a sampling of six 
members would be expected to identify only those polymorphisms 

20 that occur at a frequency of greater than about .08 (i.e. 1.0 total 
frequency divided by 6 members divided by 2 alleles per genome). 
Thus, if one desires the identification of less frequent 
polymorphisms, a greater number of panel members must be 
evaluated. 

25 Cycle sequence analysis (Mullis, K. et al. . Cold Spring Harbor 

Svmp. Quant. Biol . 51 :263-273 (1986); Eriich H. et al.. European 
Patent Appin. 50,424; European Patent Appln. 84,796, European 
Patent Application 258,017, European Patent Appln. 237,362; 
Mullis, K., European Patent Appln. 201,184; Mullis K. et al.. U.S. 

3 0 Patent No. 4,683,202; Eriich, H„ U.S. Patent No. 4,582,788; and 
Saiki, R. et al.. U.S. Patent No. 4,683,194)) is facilitated through 
the use of automated DNA sequencing instruments and software 
(Applied Biosystems, Inc.). Differences between sequences of 
different animals can thereby be identified and confirmed by 

35 inspecting the relevant portion of the chromatograms on the 
computer screen. Differences are interpreted to reflect a DNA 
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polymorphism only if the data was available for both strands, and 
present in more than one haploid example among the population of 
animals tested. Figure 2 illustrates the preferred method for 
identifying new polymorphic sequences which is cycle sequencing 
5 of a random genomic fragment. The PCR fragments from five 
unrelated horses were electroeiuted from acrylamide gels and 
sequenced using repetitive cycles of thermostable Taq DNA 
polymerase in the presence of a mixture of dNTPs and fluorescent 
ddNTPs. The products were then separated and analyzed using an 
1 0 automated DNA sequencing instrument of Applied Biosystems, Inc. 
The data was analyzed using ABI software. Differences between 
sequences of different animals were identified by the software and 
confirmed by inspecting the relevant portion of the chromatograms 
on the computer screen. Differences are presented as "DNA 

1 5 Polymorphisms" only if the data is available for both strands and 

present in more than one haploid example among the five horses 
tested. The top panel shows an "A" homozygote, the middle panel an 
"AT" heterozygote and the bottom panel a "T" homozygote. 

Despite the randomized nature of such a search for 

2 0 poiymorphisms, such sequencing and comparison of random DNA 

clones is readily able to identify suitable poiymorphisms. Indeed, 
with respect to the horse, approximately 1/400 nucleotides 
sequenced by these methods would be discovered as the 
polymorphic site of an SNP. 

25 The discovery of polymorphic sites can alternatively be 

conducted using the strategy outlined in Figure 3. In this 
embodiment, the DNA sequence polymorphisms are identified by 
comparing the restriction endonuclease cleavage profiles generated 
by a panel of several restriction enzymes on products of the PCR 

30 reaction from the genomic templates of unrelated members. Most 
preferably, each of the restriction endonucleases used will have 
four base recognition sequences, and will therefore allow a 
desirable number of cuts in the amplified products. 

The restriction digestion patterns obtained from the genomic 

35 DNAs are preferably compared directly to the patterns obtained 
from PCR products generated using the corresponding plasmid 
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templates. Such a comparison provides an internal control which 
indicates that the amplified sequences from the genomic and 
plasmid DNAs derive from equivalent loci. This control also allows 
identification of primers that fortuitously amplify repeated 
5 sequences, or multicopy loci, since these will generate many more 
fragments from the genomic DNA templates than from the plasmid 
templates. 

IN. Methods for Genotyping the Single Nucleotide 
1 0 Polymorphisms of the Present invention 

Any of a variety of methods can be used to identify the 
polymorphic site, "X," of a single nucleotide polymorphism of the 
present invention. The preferred method of such identification 

1 5 involves directly ascertaining the sequence of the polymorphic site 

for each polymorphism being analyzed. This approach is thus 
markedly different from the RFLP method which analyzes patterns 
of bands rather than the specific sequence of a polymorphism. 

2 0 A. Sampling Methods 

Nucleic acid specimens may be obtained from an individual of 
the species that is to be analyzed using either "invasive" or "non- 
invasive" sampling means. A sampling means is said to be 

2 5 "invasive" if it involves the collection of nucleic acids from within 
the skin or organs of an animal (including, especially, a murine, a 
human, an ovine, an equine, a bovine, a porcine, a canine, or a feline 
animal). Examples of invasive methods include blood collection, 
semen collection, needle biopsy, pleural aspiration, etc. Examples 

30 of such methods are discussed by Kim, C.H. et al . ( J. Virol. 86 :3879- 
3882 (1992)); Biswas, B. et al . ( Annals NY Acad. Sci. 590 :582-583 
(1990)); Biswas, B. et al . U. Clin. Microbiol. 29:2228-2233 (1991)). 

In contrast, a "non-invasive" sampling means is one in which 
the nucleic acid molecules are recovered from an internal or 

35 external surface of the animal. Examples of such "non-invasive" 
sampling means include "swabbing," collection of tears, saliva, 
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urine, fecal material, sweat or perspiration, etc. As used herein-, 
"swabbing" denotes contacting an appiicator/collector ("swab") 
containing or comprising an adsorbent material to a surface in a 
manner sufficient to collect surface debris and/or dead or sloughed 
5 off cells or cellular debris. Such collection may be accomplished 
by swabbing nasal, oral, rectal, vaginal or aurai orifices, by 
contacting the skin or tear ducts, by collecting hair follicles, etc. 

Nasal swabs have been used to obtain clinical specimens for 
PCR amplification (Olive, D.M. et al. . J. Gen. Virol. 71 :21 41 -21 47 

1 0 (1990); Wheeler, J.G. et al.. Amer. J. Vet. Res. 52:1 799-1 803 

(1991)). The use of hair follicles to identify VNTR polymorphisms 
for paternity testing in horses has been described by Ellegren, H. et 
Si. ( Animal Genetics 23 :133-142 (1992). The reference states that 
a standardized testing system based on PCR-analyzed 
1 5 microsatellite polymorphisms are likely to be an alternative to 
blood typing for paternity testing. 

A preferred swab for the collection of DNA will comprise a 
soiid support, at least a portion of which is designed to adsorb 
DNA. The portion designed to adsorb DNA may be of a compressible 

2 0 texture, such as a "foam rubber," or the like. Alternatively, it may 

be an adsorptive fibrous composition, such as cotton, polyester, 
nylon, or the like. In yet another embodiment, the portion designed 
to adsorb DNA may be an abrasive material, such as a bristle or 
brush, or having a rough surface. The portion of the swab that is 

25 designed to adsorb DNA may be a combination of the above textures 
and compositions (such as a compressible brush, etc.). The swab 
will, preferably, be specially formed in a substantially rod-like, 
arrow-like or mushroom-like shape, such that it will have a 
segment that can be held by the collecting individual, and a tip or 

30 end portion which can be placed into contact with the surface that 
contains the sample DNA that is to be collected. In one 
embodiment, the swab will be provided with a storage chamber, 
such as a plastic or glass tube or cylinder, which may have one 
open end, such as a test-tube. Alternatively, the tube may have 

35 two open ends, such that after swabbing, the collector can pull on 
one end of the swab so as to cause the other end of the swab to be 
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withdrawn into the tube. In yet another embodiment, the tube may 
have two open ends, such that after swabbing, the tube can be 
converted into a column to assist in the further processing of the 
collected DNA. in one embodiment, the end or ends of the storage 
5 chamber are self-seaiing after swabbing has been accomplished. 

The swab or the storage chamber may contain antimicrobial 
agents at concentrations sufficient to prevent the proliferation of 
microbes (bacteria, yeast, molds, etc.) during subsequent storage 
or handling. 

1 0 In one embodiment, the swab or storage chamber will contain 

an chromogenic reagent which reacts to the presence of DNA to 
yield a detectable signal that can be identified at the time of 
sample collection. Most preferably, such a reagent wiil comprise a 
minimum concentration "open-end point" assay for DNA. Such an 
O 1 5 assay is capable of detecting concentrations of nucleic acids that 
|j range from the minimum detection level of the assay to the 

§_ maximum assay saturation level of the assay. This saturation level 

|{ is adjustable, and can be increased by decreasing the time of 

m reaction. Preferred chromogenic reagents include anti-DNA 

^ 20 antibodies that are conjugated to enzymes, diaminopimelic acid, 
p etc. 

iJ B. Amplification-Based Analysis 

Li 

f " 2 5 The detection of polymorphic sites in a sample of DNA may be 

facilitated through the use of DNA amplification methods. Such 
methods specifically increase the concentration of sequences that 
span the polymorphic site, or include that site and sequences 
located either distal or proximal to it. Such amplified molecules 
30 can be readily detected by gel electrophoresis or other means. 

The most preferred method of achieving such amplification 
employs PCR, using primer pairs that are capable of hybridizing to 
the proximal sequences that define a polymorphism in its double- 
stranded form. 

35 In lieu of PCR, alternative methods, such as the "Ligase Chain 

Reaction" ("LCR") may be used (Barany, F., Proc. Natl. Acad. Sci. 
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(U.S.A.) 88:189-193 (1991). LCR uses two pairs of oligonucleotide 
probes to exponentially amplify a specific target. The sequences of 
each pair of oligonucleotides is selected to permit the pair to 
hybridize to abutting sequences of the same strand of the target. 
Such hybridization forms a substrate for a template-dependent 
ligase. As with PCR, the resulting products thus serve as a 
template in subsequent cycles and an exponential amplification of 
the desired sequence is obtained. 

In accordance with the present invention, LCR can be 
performed with oligonucleotides having the proximal and distal 
sequences of the same strand of a polymorphic site. In one 
embodiment, either oligonucleotide will be designed to include the 
actual polymorphic site of the polymorphism. In such an 
embodiment, the reaction conditions are selected such that the 
oligonucleotides can be ligated together only if the target molecule 
either contains or lacks the specific nucleotide that is 
complementary to the polymorphic site present on the 
oligonucleotide. 

In an alternative embodiment, the oligonucleotides will not 
include the polymorphic site, such that when they hybridize to the 
target molecule, a "gap" is created (see, Segev, D., PCT Application 
WO 90/01069). This gap is then "fiiled" with complementary dNTPs 
(as mediated by DNA polymerase), or by an additional pair of 
oligonucleotides. Thus, at the end of each cycle, each single strand 
has a complement capable of serving as a target during the next 
cycle and exponential amplification of the desired sequence is 
obtained. 

The "Oligonucleotide Ligation Assay" ("OLA") (Landegren, U. et 
al.. Science 241 :1077-108Q (1988)) shares certain similarities 
with LCR and may also be adapted for use in polymorphic analysis. 
The OLA protocol uses two oligonucleotides which are designed to 
be capable of hybridizing to abutting sequences of a single strand 
of a target. OLA, like LCR, is particularly suited for the detection 
of point mutations. Unlike LCR, however, OLA results in "linear" 
rather than exponential amplification of the target sequence. 
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Nickerson, D.A. et al. have described a nucleic acid detection 
assay that combines attributes of PCR and OLA {Nickerson, D.A. et 
al.. Proc. Natl. Acad. Sci. (U.S.A.) 87:8993-80.97 (1990). In this 
method, PCR is used to achieve the exponential amplification of 
5 target DNA, which is then detected using OLA. In addition to 
requiring multiple, and separate, processing steps, one problem 
associated with such combinations is that they inherit ail of the 
problems associated with PCR and OLA. 

Schemes based on ligation of two (or more) oligonucleotides 

1 0 in the presence of nucleic acid having the sequence of the resulting 

"di-oligonucleotide", thereby amplifying the di-oligonucleotide, are 
also known (Wu, D.Y. et al .. Genomics 4:560 (1989)), and may be 
readily adapted to the purposes of the present invention. 

Other known nucleic acid amplification procedures, such as 
1 5 transcription-based amplification systems (Malek, L.T. et al .. U.S. 
Patent 5,130,238; Davey, C. et al .. European Patent Application 
329,822; Schuster et aL U.S. Patent 5,169,766; Miller, H.l. et al .. 
PCT appln. WO 89/06700; Kwoh, D. et al .. Proc. Natl. Acad. Sci. 
(U.S.A.) 86:1173 (1989); Gingeras, T.R. et aL PCT application WO 

2 0 88/10315)), or isothermal amplification methods (Walker, G.T. et 

al.. Proc. Natl. Acad. Sci. (U.S.A.) 89:3Q9-39fi (1992)) may also be 
used. 

C. Preparation of Single-Stranded DNA 

25 

The direct analysis of the sequence of an SNP of the present 
invention can be accomplished using either the "dideoxy-mediated 
chain termination method," also known as the "Sanger Method" 
(Sanger, F., et al .. J. Molec. Biol. 94:441 (1975)) or the "chemical 
30 degradation method," "also known as the "Maxam-Giibert method" 
(Maxam, A.M., et al.. Proc. Natl. Acad. Sci. (U.S.A.1 74:5fin (1977), 
both references herein incorporated by reference). Methods for 
sequencing DNA using either the dideoxy-mediated method or the 
Maxam-Gilbert method are widely known to those of ordinary skill 

3 5 in the art. Such methods are, for example, disclosed in Sambrook, 

J., et al„ Molecular Cloning, a Laboratory Manual. 2nd Edition. Cold 
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Spring Harbor Press . Cold Spring Harbor, New York (1989), and in 
Zyskind, J.W., et al .. Recombinant DNA Laboratory Manual . Academic 
Press, inc. . New York (1988), both herein incorporated by reference. 
Where a nucleic acid sample contains double-stranded DNA 
5 (or RNA), or where a double-stranded nucleic acid amplification 
protocol (such as PCR) has been employed, it is generally desirable 
to conduct such sequence analysis after treating the double- 
stranded molecules so as to obtain a preparation that is enriched 
for, and preferably predominantly, only one of the two strands. 

1 0 The simplest method for generating single-stranded DNA 

molecules from double-stranded DNA is denaturation using heat or 
alkalai treatment. 

Single-stranded DNA molecules may also be produced using 
the single-stranded DNA bacteriophage M13 (Messing, J. et ai .. Meth. 

15 Enzvmoi . 1 01 :20 (1983); see also, Sambrook, J., et al. (In: 
Molecular Cloning: A Laboratory Manual . Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, NY (1989)). 

Several alternative methods can be used to generate single- 
stranded DNA molecules. Gyllensten, U. et al. . ( Proc. Natl. Acad. 

2 0 Sci. (U.S.A.') 85 :7652-7656 (1988) and Mihovilovic, M. et al.. 
( BioTechniques 7M) :14 (1989)) describe a method, termed 
"asymmetric PCR," in which the standard "PCR" method is conducted 
using primers that are present in different molar concentrations. 
Higuchi, R.G. et al . ( Nucieic Acids Res. 1_7:5865 (1985)) exemplifies 

2 5 an additional method for generating single-stranded amplification 
products. The method entaiis phosphorylating the 5'-terminus of 
one strand of a double-stranded amplification product, and then 
permitting a 5' -> 3' exonuclease (such as exonuclease) to 
preferentially degrade the phosphorytated strand. 

30 Other methods have also exploited the nuclease resistant 

properties of phosphorothioate derivatives in order to generate 
single-stranded DNA molecules (Benkovic et al .. U.S. Patent No. 
4,521,509; June 4, 1985); Sayers, J.R. et al . ( Nucl. Acids Res. 
16:791-802 (1988); Eckstein, F. et al .. Biochemistry 15:1685-1691 

35 (1976); Ott, J. et al.. Biochemistry 26:8237-8241 (1987)). 
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A discussion of the relative advantages and disadvantages of 
such methods of producing singie-stranded molecules is provided 
by Nikiforov, T. (U.S. patent application serial no. 08/005,061, 
herein incorporated by reference). 
5 Most preferably, such singie-stranded molecules will be 

produced using the methods described by Nikiforov, T. (U.S. patent 
application serial no. 08/005,061, herein incorporated by 
reference). In brief, these methods employ nuclease resistant 
nucleotides derivatives, and incorporates such derivatives, by 
1 0 chemical synthesis or enzymatic means, into primer molecules, or 
their extension products, in place of naturally occurring 
nucleotides. 

Suitable nucleotide derivatives include derivatives in which 
one or two of the non-bridging oxygens of the phosphate moiety of 

1 5 a nucleotide has been replaced with a sulfur-containing group 
(especially a phosphorothioate), an alkyl group (especially a methyl 
or ethyl aikyl group), a nitrogen-containing group (especially an 
amine), and/or a selenium-containing group, etc. 

Phosphorothioate deoxyribonucieotide or ribonucleotide 

20 derivatives (e.g. a nucleoside 5'-0-1-thiotriphosphate) are the 
most preferred nucleotide- d-arivatives. Any of a variety of 
chemical methods may be used to produce such phosphorothioate 
derivatives (see, for example, Zon, G. et aL Anti-Canc. Drug Pes. 
6:539-568 (1991); Kim, S.G. et aL Biochem. Biophvs. Res. Commun. 

25 179:1614-1619 (1991); Vu, H. et aL Tetrahedron Lett. 32:3005- 
3008 (1991); Taylor, J.W. et aL Nucl. Acids Res. 13:8749-8764 
(1985); Eckstein, F. et aL Biochemistry 15:1685-1691 (1976); Ott, 
J. et aL Biochemistry 26 :8237-8241 (1987); Ludwig, J. et aL J. 
Org. Chem. 5_4:631-635 (1989), all herein incorporated by 

30 reference). Phosphorothioate nucleotide derivatives can also be 
obtained commercially from Amersham or Pharmacia. 

Importantly, the selected nucleotide derivative must be 
suitable for in vitro primer-mediated extension and provide 
nuclease resistance to the region of the nucleic acid molecule in 

35 which it is incorporated. In the most preferred embodiment, it 
must confer resistance to exonucleases that attack double- 
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stranded DNA from the 5'-end (5'->3' exonucieases). Examples of 
such exonucieases include bacteriophage T7 gene 6 exonuclease 
("T7 exonuclease) and the bacteriophage lambda exonuciease (*X 
exonuclease"), Both T7 exonuclease and X exonuciease are inhibited 
5 to a significant degree by the presence of phosphorothioate bonds 
so as to aliow the selective degradation of one of the strands. 
However, any double-strand specific, S'-^ exonuclease can be used 
for this process, provided that its activity is affected by the 
presence of the bonds of the nuclease resistant nucleotide 
1 0 derivatives. The preferred enzyme when using phosphorothioate 
derivatives is the 17 gene 6 exonuclease, which shows maximal 
enzymatic activity in the same buffer used for many DNA dependent 
polymerase buffers including Taq polymerase. The 5 f -»3' 
exonuciease resistant properties of phosphorothioate derivative- 
US 1 5 containing DNA molecules are discussed, for example, in Kunkel, 
5 TA (in: Nucleic Acids and Molecular Biology , Vol. 2, 124-135 
?p (Eckstein, F. et aL eds.), Springer-Verlag, Berlin, (1988)). The 
to 3 f -»5' exonuclease resistant properties of phosphorothioate 
y nucleotide containing nucleic acid molecules are disclosed in 
■ 20 Putney, S.D., et al. ( Proc. Natl. Acad. Sci. (U.S.A.) 78:7350-7354 

5 (1981)) and Gupta, A.P., et aL ( Nucl. Acids. Res., 1 2:5897-591 1 

6 (1984)). 

5 In addition to being resistant to such exonucieases, nucleic 

C acid molecules that contain phosphorothioate derivatives at 

25 restriction endonuclease cleavage recognition sites are resistant 
to such cleavage. Taylor, J.W., et al. ( Nucl. Acids Res. , 1 3:8749- 
8784 (1985)) discusses the endonuclease resistant properties of 
phosphorothioate nucleotide containing nucleic acid molecules. 

The nuclease resistance of phosphorothioate bonds has been 
3 0 utilized in a DNA amplification protocol (Walker, T.G. et aL ( Proc. 
Natl. Acad. Sci. (U.S.A.) 89:392-396 (1992)). In the Walker et al . 
method, phosphorothioate nucleotide derivatives are installed 
within a restriction endonuclease recognition site in one strand of 
a double-stranded DNA molecule. The presence of the 
35 phosphorothioate nucleotide derivatives protects that strand from 
cleavage, and thus results in the nicking of the unprotected strand 
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by the restriction endonuciease. Amplification is accomplished by 
cycling the nicking and polymerization of the strands. 

Similarly, this resistance to nuclease attack has been used as 
the basis for a modified "Sanger" sequencing method (Labeit, S. ej 
5 al. ( DNA 5:173-177 (1986)). In the Labeit et al . method, 35s- 
labeied phosphorothioate nucleotide derivatives were employed in 
lieu of the dideoxy nucleotides of the "Sanger" method. 

In the most preferred embodiment, the phosphorothioate 
derivative is included in the primer. The nucleotide derivative may 

1 0 be incorporated into any position of the primer, but wiil preferably 
be incorporated at the 5'-terminus of the primer, most preferably 
adjacent to one another. Preferably, the primer molecules will be 
approximately 25 nucleotides in length, and contain from about 4% 
to about 100%, and more preferably from about 4% to about 40%, 

15 and most preferably about 16%, phosphorothioate residues (as 
compared to total residues). The nucleotides may be incorporated 
into any position of the primer, and may be adjacent to one another, 
or interspersed across all or part of the primer. 

In one embodiment, the present invention can be used in 

20 concert with an amplification protocol, for example, PCR. In this 
embodiment, it is preferred to limit the number of 
phosphorothioate bonds of the primers to about 10 (or 
approximately half of the length of the primers), so that the 
primers can be used in a PCR reaction without any changes to the 

25 PCR protocol that has been established for non-modified primers. 
When the primers contain more phosphorothioate bonds, the PCR 
conditions may require adjustment, especially of the annealing 
temperature, in order to optimize the reaction. 

The incorporation of such nucleotide derivatives into DNA or 

30 RNA can be accomplished enzymatically, using a DNA polymerase 
(Vosberg, H.P. et al .. Biochemistry 16 : 3633-3640 (1977); Burgers, 
P.M.J, et al .. J. Biol. Chem. 254:6889-6893 (1979); Kunkei, T.A., In: 
Nucleic Acids and Molecular Biology . Vol. 2, 124-135 (Eckstein, F. 
et al .. eds.), Springer-Verlag, Berlin, (1988); Olsen, D.B. et al .. Proc. 

35 Natl. Acad. Sci. (U.S.A.^ 87:1451-1455 (1990); Griep, M.A. et al .. 
Biochemistry 29:9006-9014 (1990); Sayers, J.R. et al .. Nucl. Acids 
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Res. 16 :791-802 (1988)). Alternatively, phosphorothioate 
nucleotide derivatives can be incorporated synthetically into an 
oligonucleotide (Zon, G. et ai .. Anti-Canc. Drug Pes. 6.:539-568 
(1991)). 

The primer molecules are permitted to hybridize to a 
complementary target nucleic acid molecule, and are then extended, 
preferably via a polymerase, to form an extension product. The 
presence of the phosphorothioate nucleotides in the primers 
renders the extension product resistant to nuclease attack. As 
indicated, the amplification products containing phosphorothioate 
or other suitable nucleotide derivatives are substantially resistant 
to "elimination" (i.e. degradation) by "5'->3"' exonucleases such as 
T7 exonuclease or exonuciease, and thus a 5'->3' exonuciease will 
be substantially incapable of further degrading a nucleic acid 
molecule once it has encountered a phosphorothioate residue. 

Since the target molecule lacks nuclease resistant residues, 
the incubation of the extension product and its template - the 
target - in the presence of a 5'->3' exonuciease results in the 
destruction of the template strand, and thereby achieves the 
preferential production of the desired single strand. 

D. Solid Phase Attachment of DNA 

The preferred method of determining the identity of the 
polymorphic site of a polymorphism involves nucleic acid 
hybridization. Although such hybridization can be performed in 
solution (Berk, A.J., et al. Ceil 12:721 -732 (1977); Hood, L.E., et aL 
in: Molecular Biology of Eukaryotic Cells: A Problems Approach . 
Menlo Park, CA: Benjamin-Cummings, (1975); Wetmer, J.G., 
Hybridization and Renaturation Kinetics of Nucleic Acids . Ann. Rev. 
Biophvs. Bioeng. 5:337-361 (1976); Itakura, K., et al.. Ann. Rev. 
Biochem. 53 :323-356. (1984)), it is preferable to employ a solid- 
phase hybridization assay (see, Saiki, R.K. et al.. Proc. Natl. Acad. 
Sci. fU.S.A.) 86:6230-6234 (1989); Gilham et al. . J. Amer. Chem. 
Soc. 86 :4982 (1964) and Kremsky et al.. Nuci. Acids Res. 
15:3131-3139 (1987)). 
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Any of a variety of methods can be used to immobilize 
oligonucleotides to the solid support. One of the most widely used 
methods to achieve such an immobilization of oligonucleotide 
primers for subsequent use in hybridization-based assays consists 
5 of the non-covalent coating of these solid phases with streptavidin 
or avidin and the subsequent immobilization of biotinylated 
oligonucleotides (Holmstrom, K. et al.. Anal. Biochem. 209 :278-283 
(1993)). Another known method (Running. J.A. et al. . BioTechniques 
8_:276-277 (1990); Newton, C.R. et al. Nucl. Acids Res. 21:1155- 

10 1162 (1993)) requires the pre-coating of the polystyrene or glass 
solid phases with poly-L-Lys or poly L-Lys, Phe, followed by the 
covalent attachment of either amino- or sulfhydryl-modified 
oligonucleotides using bifunctional crosslinking reagents. Both 
methods have the disadvantage of requiring the use of modified 

1 5 oligonucleotides as well as a pre-treatment of the solid phase. 

In another published method (Kawai, S et ai. . Anal. Biochem. 
209 :63-69 (1993)), short oligonucleotide probes were ligated 
together to form muitimers and these were ligated into a phagemid 
vector. Following in vitro amplification and isolation of the 

20 single-stranded form of these phagemids, they were immobilized 
onto polystyrene plates and fixed by UV irradiation at 254 nm. The 
probes immobilized in this way were then used to capture and 
detect a biotinylated PCR product. 

A method for the direct covalent attachment of short, 5'- 

25. phosphorylated primers to chemically modified polystyrene plates 
("Covalink" plates, Nunc) has also been published (Rasmussen, S.R. 
et al. . Anal. Biochem. 198 :138-142 (1991)). The covalent bond 
between the modified oligonucleotide and the solid phase surface 
is introduced by condensation with a water-soluble carbodiimide. 

30 This method is claimed to assure a predominantly 5'-attachment of 
the oligonucleotides via their 5'-phosphates; however, it requires 
the use of specially prepared, expensive plates. 

Most preferably, such immobilization of oligonucleotides 
(preferably between 15 and 30 bases) is accomplished using a 

35 method that can be used directly, without the need for any pre- 
treatment of commercially available polystyrene microweil plates 
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(ELISA plates) or microscope glass slides. Since 96 well 
polystyrene plates are widely used in ELiSA tests, there has been 
significant interest in the development of methods for the 
immobilization of short oligonucleotide primers to the wells of 
5 these plates for subsequent hybridization assays. Also of interest 
is a method for the immobilization to microscope glass slides, 
since the latter are used in the so-called Slide Immunoenzymatic 
Assay (SIA) (de Macario, E.C. et al. . BioTechniques 3:138-145 
(1985)). 

1 0 The solid support can be glass, plastic, paper, etc. The 

support can be fashioned as a bead, dipstick, test tube, etc. In a 
preferred embodiment, the support will be a microliter dish, having 
a multiplicity of wells. The conventional 96-well microtiter 
dishes used in diagnostic laboratories and in tissue culture are a 

1 5 preferred support. The use of such a support allows the 
simultaneous determination of a large number of samples and 
controls, and thus facilitates the analysis. Automated delivery 
systems can be used to provide reagents to such microtiter dishes. 
Similarly, spectrophotometric methods can be used to analyze the 

20 polymorphic sites, and such analysis can be conducted using 
automated spectrophotometers. 

One aspect of the present invention concerns a method for 
immobilizing oligonucleotides for such analysis. In accordance 
with the method, any of a number of commercially available 

25 polystyrene plates can be used directly for the immobilization, 
provided that they have a hydrophilic surface. Examples of suitable 
plates include the Immulon 4 plates (Dynatech) and the Maxisorp 
plates (Nunc). The immobilization of the oligonucleotides to the 
plates is achieved simply by incubation in the presence of a 

30 suitable salt. No immobilization takes place in the absence of a 
salt, i.e., when the oligonucleotide is present in a water solution. 
Examples for suitable salts are: 50-250 mM NaCI; 30-100 mM 1- 
ethyl-3-(3'-dimethylaminopropyl)carbodiimide hydrochloride (EDC), 
pH 6.8; 50-150 mM octyldimethylamine hydrochloride, pH 7.0; 50- 

35 250 mM tetramethylammonium chloride. The immobilization is 
achieved by incubation, preferably at room temperature for 3 to 24 
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hours. After such incubation, the plates are washed, preferably 
with a solution of 10 mM Tris HCI, pH 7.5, containing 150 mM NaCI 
and 0.05% voi. Tween-20 (TNTw). The latter ingredient serves the 
important role of blocking all free oligonucleotide binding sites 
5 still present on the polystyrene surface, so that no nonspecific 
binding of oligonucleotides can take place during the subsequent 
hybridization steps. Using radioactively labeled oligonucleotides, 
the amount of immobilized oligonucleotides per well was 
determined to be at least 500 fmoles. The oligonucleotides are 
1 0 immobilized to the surface of the plate with sufficient stability 
and can only be removed by prolonged incubations with 0.5 M NaOH 
solutions at elevated temperatures. No oligonucleotide is removed 
by washing the plate with water, TNTw (Tween 20), PBS, 1.5 M 
NaCI, or other similar solutions. 
1 5 The immobilized oligonucleotides can be used to capture 

specific DNA sequences by hybridization. The hybridization is 
usually carried out in a solution containing 1.5 M .NaCI and 10 mM 
EDTA, for 15 to 30 minutes at room temperature. Other 
P hybridization conditions can also be used. More than 400 fmoles of 

^ 20 a specific DNA sequence was found to hybridize to the immobilized 
O oligonucleotide in one well. This DNA is bound to the initially 

!ij immobilized oligonucleotide only via Watson-Crick hydrogen bonds 

u can be easily removed from the wells by a brief wash with a 0.1 M 

& NaOH solution, without removing the initially attached 

2 5 oligonucleotide from the plate. If the captured DNA fragment is 
nonradioactively labeled, e.g., with a biotin residue, the detection 
can be carried out using a suitable enzyme-linked assay. 

Although no modifications have to be introduced into the 
synthetic oligonucleotides, the method also allows for the 
30 immobilization of labeled (e.g., biotinylated) oligonucleotides, if 
desired. The amount of oligonucleotide that can be immobilized in 
a single well of an ELISA plate by this method is at least 500 
fmoies. The oligonucleotides thus immobilized onto the solid phase 
can hybridize to suitable templates and also participate in 
35 enzymatic reactions like template-directed extensions and 
ligations. 
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For high volume testing applications, it is desirable to use 
non-radioactive detection methods. Thus, the use of haptenated 
dideoxynucleotides is preferred; the use of biotinylated 
dideoxynucleotides is particularly preferred as such modification 
5 would render the incorporated base detectable by the standard 
avidin (or streptavidin) enzyme conjugates used in ELISA assays. 
The biotinyiated ddNTPs are preferably prepared by reacting the 
four respective (3-aminopropyn-1-yl)nucleoside triphosphates 
with sulfosuccinimidyl 6-(biotinamido)hexanoate. Thus, (3- 

10 aminopropyn-1-yl) nucleoside 5'-triphosphates are prepared as 
described by Hobbs, F.W. ( J. Org. Chem. 54:3420-3422 (1989)) and 
by Hobbs, F.W. et al . (U.S. Patent No. 5,047,519). The (3- 
aminopropyn-1-yi)nucleoside 5'-triphosphate (50 mol) is dissolved 
in 1 mi of pH 7.6, 1 M aqueous triethylammonium bicarbonate 

15 (TEAB). Sulfosuccinimidyl 6-(biotinamido) hexanoate sodium salt 
(Pierce, 55.7 mg, 100 mol) is added and the solution is heated to 
50°C in a stoppered tube for 2 hr. The reaction mixture is diluted 
to 10 ml with water and applied to a DEAE-Sephadex A-25-120 
column (1.6 x 19 cm). The column is eluted with a linear gradient 

2 0 of pH 7.6 aqueous TEAB (0.1 M to 1.0 M) and the eluent monitored at 
270 nm. The late-eluting major peak is collected, stripped, and co- 
evaporated with ethanol. The crude product, containing 
biotinylated nucleoside triphosphate and, in some cases, 
contaminating starting material, is further purified by reverse 

25 phase column chromatography (Baker C-18 packing, 2 x 12 cm bed). 
The material is loaded in 0.1 M pH 7.6 TEAB and eluted with a step 
gradient of acetonitrile in 0.1 M pH 7.6 TEAB (0% to 36%, 2% 
increments, 8 ml/step). In all cases, the biotinyiated product is 
more strongly retained and cleanly resolved from the starting 

30 materia!. Product-containing fractions are pooled, stripped, and 
co-evaporated with ethanol. The product is taken up in water and 
the yield calculated using the absorption coefficient for the 
starting nucleotide. The 3h NMR and 31 p NMR spectra are 
consistent with the expected structure and confirm the absence of 

35 phosphorus containing or nucleotide-derived impurities. The 
materials are observed to be >99% pure by HPLC (Waters Bondapak 



- 33 - 



C-18, 4.6 x 250 mm, 1 ml/min, 1 to 35% CH 3 CN/pH 7/0.01 M 
triethylammonium acetate). 

The synthesis of 5-{3-(6-biotinamido(hexanoamido) propyn- 
1-yl)-2',3'-dideoxyuridine-5'-triphosphate has an approximate 
5 yield of 25% (assuming = 12,400 at 291.5 nm); HPLC t x = 16.1 
min. 

The synthesis of 5-{3-(6-biotinamido{hexanoamido) propyn- 
l-yO^'.S'-dideoxycytidine-S'-triphosphate has an approximate 
yield of 63% (assuming = 9,230 at 294.5 nm); HPLC t x = 19.4 min. 
1 0 The synthesis of 7-(3-(6-biotinamido(hexanoamido) propyn- 

l-yO^-deaza^'.S'-dideoxyadenosine-S'-triphosphate has an 
approximate yield of 39% (assuming = 13,600 at 278.5 nm); HPLC 
t x = 23.1 min. 

The synthesis of 7-(3-(6-biotinamido(hexanoamido) propyn- 
15 1 -yl)-7-deaza-2',3'-dideoxyguanosine-5'-triphosphate has an 
approximate yield of 44% (assuming = 9,300 at 291 nm); HPLC tx = 
21.2 min. 

E. Solid Phase Analysis of Polymorphic Sites 

20 

1 . Polymerase-Mediated Analysis 

Although the identity of the nucleotide(s) of the polymorphic 
sites of the present invention can be determined in a variety of 

25 ways, an especially preferred method exploits the oligonucleotide- 
based diagnostic assay of nucleic acid sequence variation disclosed 
by Goeiet, P. et al . (PCT Application W092/15712, herein 
incorporated by reference). In this assay, a purified 
oligonucleotide having a defined sequence (complementary to an 

30 immediate proximal or distal sequence of a polymorphism) is bound 
to a solid support, especially a microtiter dish. A sample, 
suspected to contain the target molecule, or an amplification 
product thereof, is placed in contact with the support, and any 
target molecules present are permitted to hybridize to the bound 

35 oligonucleotide. 
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In one preferred embodiment, an oligonucleotide having a 
sequence that is complementary to an immediately distal sequence 
of a polymorphism is prepared using the above-described methods 
(and preferably that of Nikiforov, T. (U.S. Patent Application Serial 
5 No. 08/005,061). The terminus of the oligonucleotide is attached 
to the solid support, as described, for example by Goelet, P. et al . 
(PCT Application WO 92/15712), such that the 3'-end of the 
oligonucleotide can serve as a substrate for primer extension. 

The immobilized primer is then incubated in the presence of a 
1 0 DNA molecule (preferably a genomic DNA molecule) having a single 
nucleotide polymorphism whose immediately 3'-distal sequence is 
complementary to that of the immobilized primer. Preferably, such 
incubation occurs in the complete absence of any dNTP (i.e. dATP, 
dCTP, dGTP, or dTTP), but only in the presence of one or more chain 

1 5 terminating nucleotide triphosphate derivatives (such as a dideoxy 

derivative), and under conditions sufficient to permit the 
incorporation of such a derivative on to the 3'-terminus of the 
primer. As will be appreciated, where the polymorphic site is such 
that only two or three alleles exist (such that only two or three 

2 0 species of dNTPs, respectively, could be incorporated into the 

primer extension product), the presence of unusable nucleotide 
triphosphate(s) in the reaction is immaterial. In consequence of 
the incubation, and the use of only chain terminating nucleotide 
derivatives, a single dideoxynucieotide is added to the 3'-terminus 

2 5 of the primer. The identity of that added nucleotide is determined 
by : and is complementary to, the nucleotide of the polymorphic site 
of the polymorphism. 

In this embodiment, the nucleotide of the polymorphic site is 
thus determined by assaying which of the set of labeled 

30 nucleotides has been incorporated onto the 3'-terminus of the 
bound oligonucleotide by a primer-dependent polymerase. Most 
preferably, where multiple dideoxynucieotide derivatives are 
simultaneously employed, different labels will be used to permit 
the differential determination of the identity of the incorporated 

35 dideoxynucieotide derivative. 
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2. Polymerase/Ligase-Mediated Analysis 

In an alternative embodiment, the identity of the nucleotide 
of the polymorphic site is determined using a polymerase/ligase- 
5 mediated process. As in the above embodiment, an oligonucleotide 
primer is employed, that is complementary to the immediately 3'- 
distal invariant sequence of the SNP. A second oligonucleotide, is 
tethered to the solid phase via its 3'-end. The sequence of this 
oligonucleotide is complementary to the 5'-proximal sequence of 
1 0 the polymorphism being analyzed, but is incapable of hybridizing to 
the oligonucleotide primer. 

These oligonucleotides are incubated in the presence of DNA 
containing the singie nucleotide polymorphism that is to be 
analyzed, and at least one 2', S'-deoxynucieotide triphosphate. The 

1 5 incubation reaction further includes a DNA polymerase and a DNA 

ligase. Thus, for example, where the polymorphism of clone 177-2 
(Table 1) is being evaluated, and the tethered oligonucleotide could 
comprise the 3'-distaI sequence of SEQ ID NO:2, the second 
oligonucleotide would have the 5'-proximal sequence of SEQ ID 

2 0 NO:1. 

The tethered and soluble oligonucleotides are thus capable of 
hybridizing to the same strand of the single nucleotide 
polymorphism under analysis. The sequence considerations cause 
the two oligonucleotides to hybridize to the proximal and distal 

25 sequences of the SNP that flank the polymorphic site (X) of the 
polymorphism; the hybridized oligonucleotides are thus separated 
by a "gap" of a single nucleotide at the precise position of the 
polymorphic site. 

The presence of a polymerase and a 2', 5'-deoxynucleotide 

30 triphosphate complementary to (X) permits ligation of the primer 
extended with the complementary 2', 5'-deoxynucleotide 
triphosphate to the immobilized oligo complementary to the distal 
sequence, a 2', 5'-deoxynucleotide triphosphate that is 
complementary to the nucleotide of the polymorphic site permits 

35 the creation of a ligatable substrate. The ligation reaction 



- 36 - 



immobilizes the 2', 5'-deoxynucleotide and the previously soluble 
primer oligonucleotide to the solid support. 

The identity of the polymorphic site that was opposite the 
"gap" can then be determined by any of several means. In a 
5 preferred embodiment, the 2', 5'-deoxynucleotide triphosphate of 
the reaction is labeled, and its detection thus reveals the identity 
of the complementary nucleotide of the polymorphic site. Several 
different 2', 5'-deoxynucleotide triphosphates may be present, each 
differentially labeled. Alternatively, separate reactions can be 

10 conducted, each with a different 2', 5'-deoxynucIeotide 
triphosphate. In an alternative sub-embodiment, the 2 1 , 5'- 
deoxynucleotide triphosphates are unlabeled, and the second, 
soluble oligonucleotide is labeled. Separate reactions are 
conducted, each using a different unlabeled 2', 5'-deoxynucleotide 

1 5 triphosphate. The reaction that contains the complementary 
nucleotide permits the ligatable substrate to form, and is detected 
by detecting the immobilization of the previously soluble 
oligonucleotide. 

20 F. Signal-Amplification 

The sensitivity of nucleic acid hybridization detection assays 
may be increased by altering the manner in which detection is 
reported or signaled to the observer. Thus, for example, assay 

25 sensitivity can be increased through the use of detectably labeled 
reagents. A wide variety of such signal amplification methods 
have been designed for this purpose. Kourilsky et al. (U.S. Patent 
4,581,333) describe the use of enzyme labels to increase 
sensitivity in a detection assay. Fluorescent labels (Albarella et 

30 ah, EP 144914), chemical labels (Sheldon III et al. . U.S. Patent 
4,582,789; Albarella et al.. U.S. Patent 4,563,417), modified bases 
(Miyoshi et aL EP 119448), etc. have also been used in an effort to 
improve the efficiency with which hybridization can be observed. 
It is preferable to employ fluorescent, and more preferably 

35 chromogenic (especially enzyme) labels, such that the identity of 
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the incorporated nucleotide can be determined in an automated, or 
semi-automated manner using a spectrophotometer. 

IV. The Use of SNP Genotyping in Methods of Genetic 
Anaiysis 

A. General Considerations for Using Single 
Nucleotide Polymorphisms in Genetic Analysis 

The utility of the polymorphic sites of the present invention 
stems from the ability to use such sites to predict the statistical 
probability that two individuals will have the same alleles for any 
given polymorphisms. 

Statistical analysis of SNPs can be used for any of a variety 
of purposes. Where a particular animal has been previously tested, 
such testing can be used as a "fingerprint" with which to determine 
if a certain animal is, or is not that particular animal. 

Where a putative parent or both parents of an individual have 
been tested, the methods of the present invention may be used to 
determine the likelihood that a particular animal is or is not the 
progeny of such parent or parents. Thus, the detection and anaiysis 
of SNVs can be used to exclude paternity of a male for a particular 
individual (such as a stallion's paternity of a particular foal), or to 
assess the probability that a particular individual is the progeny of 
a selected female (such as a particular foal and a selected mare). 

As indicated below, the present invention permits the 
construction of a genetic map of a target species. Thus, the 
particular array of polymorphisms identified by the methods of the 
present invention can be correlated with a particular trait, in order 
to predict the predisposition of a particular animal (or plant) to 
such genetic disease, condition, or trait. As used herein, the term 
"trait" is intended to encompass "genetic disease," "condition," or 
"characteristics." The term, "genetic disease" denotes a 
pathological state caused by a mutation, regardless of whether 
that state can be detected or is asymptomatic. A "condition" 
denotes a predisposition to a characteristic (such as asthma, weak 
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bones, blindness, uicers, cancers, heart or cardiovascular illnesses, 
skeleto-muscular defects, etc.). A "characteristic" is an attribute 
that imparts economic value to a plant or animal. Examples of 
characteristics include longevity, speed, endurance, rate of aging, 
5 fertility, etc. 

B. Identification and Parentage Verification 

The most useful measurements for determining the power of 

1 0 an identification and paternity testing system are: (i) the 
"probability of identity" (p(ID)) and (ii) the "probability of 
exclusion" (p(exc)). The p(ID) calculates the likelihood that two 
random individuals will have the same genotype with respect to a 
given polymorphic marker. The p(exc) calculates the likelihood, 

1 5 with respect to a given polymorphic marker, that a random male 
will have a genotype incompatible with him being the father in an 
average paternity case in which the identity of the mother is not in 
question. Since single genetic loci, including loci with numerous 
alleles such as the major histocompatibility region, rarely provide 

20 tests with adequate statistical confidence for paternity testing, a 
desirable test will preferably measure multiple unlinked loci in 
parallel. Cumulative probabilities of identity or non-identity, and 
cumulative probabilities of paternity exclusion are determined for 
these multi-locus tests by multiplying the probabilities provided 

25 by each locus. 

The statistical measurements of greatest interest are: (i) the 
cumulative probability of non-identity (cum p(nonlD)), and (ii) the 
cumulative probability of paternity exclusion (cum p(exc)). 

The formulas used for calculating these probability values 

30 are given below. For simplicity these are given first for 2-allele 
loci, where one allele is termed type A and the other type B. In 
such a model, four genotypes are possible: AA, AB, BA, and BB 
(types AB and BA being indistinguishable biochemically). The 
allelic frequency is given by the number of times A (f(A), the 

35 frequency of A is denoted by "p") or B (f(B), the frequency of B is 
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5 



20 



denoted by "q," where q = 1-p) is found in the haploid genome. The 
probability of a given genotype at a given locus: 

Homozygote: p(A4)= p 2 



Single Heterozygote: p(AB)= p{BA)- pq - p(1-p) 



1 0 Both Heterozygotes: p{AB+BA)= 2pq = 2p(1 -p) 



Homozygote: p(BB)= q 2 = (1-p) 2 

15 The probability of identity at one iocus (i.e the probability 

that two individuals, picked at random from a population will have 
identical genotypes at a given locus) is given by the equation: 



p(/D) = (p 2 ) 2 + (2pc7) 2 + (g 2 ) 2 



The cumulative probability of identity for n loci is therefore 
given by the equation: 



25. cum p(ID) = Qp(ID,)p(ID 2 )p(lD 3 )....p(ID n ) 



The cumulative probability of non-identity for n loci (i.e. the 
probability that two individuals will be different at 1 or more loci) 
30 is given by the equation: 



cum p(nonlD) = 1 - cum p(!D) 
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The probability of parentage exclusion (representing the 
probability that a random male wiii have a genotype, with respect 
to a given locus, that makes him incompatible as the sire in an 
average paternity case where the identity of the mother is not in 
5 question) is given by the equation: 

p(exc) = pq(l-pq) 



1 0 "The probability of non-exclusion (representing the probability 

at a given locus that a random male will not be biochemically 
excluded as the sire in an average paternity case) is given by the 
equation: 

1 5 p{non-exc) = 1 - p(exc) 



The cumulative probability of non-exclusion (representing the 
value obtained when n loci are used) is thus: 

20 

cum p(non-exc) = Qp{non-exc 1 )p(non-exc 2 )p(non-exc z )....p(non-exc n ) 



The cumulative probability of exclusion (representing the 
25 probability, using a panel of n loci, that a random male will be 
biochemically excluded as the sire in an average paternity case 
where the mother is not in question) is given by the equation: 

cum p(exc) = 1 - cum p(non-exc) 

30 

These calculations may be extended for any number of alleles 
at a given locus. For example, the probability of identity p(\D) for a 
3-allele system where the alleles have the frequencies in the 
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population of p, q and r, respectively, is equal to the sum of the 
squares of the genotype frequencies: 

p(ID) = p 4 + (2pg) 2 + (2qrf + (2pr) 2 + r 4 + a/ 4 

5 

Similarly j the probability of exclusion for a three allele 
system is given by: 

1 0 p(exc) = pg(1 -pq) + qr(1 -qr) + pr(1 -pr) + 3pgr(1 -pqr) 

In a locus of n alleles, the appropriate binomial expansion is 
used to calculate p(ID) and p(exc). 

15 Figures 4 and 5 show how the cum p(nonlD) and the 

cum p(exc) increase with both the number and type of genetic loci 
used. It can be seen that greater discriminatory power is achieved 
with fewer markers when using three allele systems. In Figures 4 
and 5, the triangles trace the increase in probability values with 

20 increasing numbers of loci with two alleles where the common 
allele is present at a frequency of p = 0.79. The crosses in Figures 
4 and 5 show the same analysis for increasing numbers of three- 
allele loci where p = 0.51, q = 0.34 and r = 0.15. 

The choice between whether to use loci with 2, 3 or more 

25 alleles is however largely influenced by the above-described 
biochemical considerations. A polymorphic analysis test may be 
designed to score for any number of alleles at a given locus. If 
allelic scoring is to be performed using gel electrophoresis, each 
allele should be easily resolvable by gel electrophoresis. Since the 

3 0 length variations in multiple allelic families are often small, 
human DNA tests using multiple allelic families include statistical 
corrections for mistaken identification of alleles. Furthermore, 
although the appearance of a rare allele from a multiple allelic 
system may be highly informative, the rarity of these alleles 

35 makes accurate measurements of their frequency in the population 
extremely difficult. To correct for errors in these frequency 
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estimates when using rare alleles, the statistical analysis of this 
data must include a measure of the cumulative effects of 
uncertainty in these frequency estimates. The use of these 
multiple allelic systems also increases the likelihood that new or 
5 rare alleles in the population will be discovered during the course 
of large population screening. The integrity of previously collected 
genetic data would be empirically revised to reflect the discovery 
of a new allele. 

In view of these considerations, although the use of loci with 
1 0 many alleles could potentially offer some short-term advantages 
(because fewer loci would need to be screened), it is preferable to 
perform polymorphic analyses using loci with fewer alleles that 
are: (i) more frequently represented, and (ii) easier to measure 
unambiguously. Tests of this type can achieve the same power of 
1 5 . discrimination as tests based on more highly polymorphic loci, 
provided the same total number of alleles is collected from a 
series of unlinked ioci. 

C. Gene Mapping and Genetic Trait Analysis Using 
20 SNPs 

The polymorphisms detected in a set of individuals of the 
same species (such as humans, horses, etc.), or of closely related 
species, can be analyzed to determine whether the presence or 
2 5 absence of a particular polymorphism correlates with a particular 
trait. 

To perform such polymorphic analysis, the presence or 
absence of a set of polymorphisms (i.e. a "polymorphic array 11 ) is 
determined for a set of the individuals, some of which exhibit a 

30 particular trait, and some of which exhibit a mutually exclusive 
characteristic (for example, with respect to horses, brittle bones 
vs. non-brittle bones; maturity onset blindness vs. no blindness; 
predisposition to asthma, cardiovascular disease vs, no such 
predisposition). The alleles of each polymorphism of the set are 

35 then reviewed to determine whether the presence or absence of a 
particular allele is associated with the particular trait of interest. 
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Any such correlation defines a genetic map of the individual's 
species. Alleles that do not segregate randomly with respect to a 
trait can be used to predict the probability that a particular animal 
will express that characteristic. For example, if a particular 
5 polymorphic allele is present in only 20% of the members of a 
species that exhibit a cardiovascular condition then a particular 
member of that species containing that allele would have a 20% 
probability of exhibiting such a cardiovascular condition. As 
indicated, the predictive power of the analysis is increased by the 
1 0 extent of linkage between a particular polymorphic allele and a 
particular characteristic. Similarly, the predictive power of the 
analysis can be increased by simultaneously analyzing the alleles 
of multiple polymorphic loci and a particular trait. In the above 
example, if a second polymorphic allele was found to also be 

1 5 present in 20% of members exhibiting the cardiovascular condition, 

however, all of the evaluated members that exhibited such a 
cardiovascular condition had a particular combination of alleles for 
these first and second polymorphisms, then a particular member 
containing both such alleles would have a very high probability of 
20 exhibiting the cardiovascular condition. 

The detection of multiple polymorphic sites permits one to 
define the frequency with which such sites independently 
segregate in a population. If, for example, two polymorphic sites 
segregate randomly, then they are either on separate chromosomes, 

2 5 or are distant to one another on the same chromosome. Conversely, 

two polymorphic sites that are co-inherited at significant 
frequency are linked to one another on the same chromosome. An 
analysis of the frequency of segregation thus permits the 
establishment of a genetic map of markers. Thus, the present 
30 invention provides a means for mapping the genomes of plants and 
animals. 

The resolution of a genetic map is proportional to the number 
of markers that it contains. Since the methods of the present 
invention can be used to isolate a large number of polymorphic 
35 sites, they can be used to create a map having any desired degree of 
resolution. 
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The sequencing of the polymorphic sites greatly increases 
their utility in gene mapping. Such sequences can be used to design 
oligonucleotide primers and probes that can be employed to "walk" 
down the chromosome and thereby identify new marker sites 
5 (Bender, W. et aL J. Supra. Molec. Struc. 10(supp|.) :flp (1979); 
Chinault, A.C. et al. , Gene 5:111-126 (1979); Clarke, L et aL Nature 
287:504-509 (1980)). 

The resolution of the map can be further increased by 
combining polymorphic analyses with data on the phenotype of 
1 0 other attributes of the plant or animal whose genome is being 
mapped. Thus, if a particular polymorphism segregates with brown 
hair color, then that polymorphism maps to a locus near the gene or 
genes that are responsible for hair color. Similarly, biochemical 
data can be used to increase the resolution of the genetic map. In 

1 5 this embodiment, a biochemical determination (such as a serotype, 

isoform, etc.) is studied in order to determine whether it co- 
segregates with any polymorphic site. Such maps can be used to 
identify new gene sequences, to identify the causal mutations of 
disease, for example. 
20 Indeed, the identification of the SNPs of the present 

invention permits one to use complimentary oligonucleotides as 
primers in PCR or other reactions to isolate and sequence novel 
gene sequences located on either side of the SNP. The invention 
includes such novel gene sequences. The genomic sequences that 

2 5 can be cionally isolated through the use of such primers can be 

transcribed into RNA, and expressed as protein. The present 
invention also includes such protein, as well as antibodies and 
other binding molecules capable of binding to such protein. 

The invention is illustrated below with respect to two of its 

30 embodiments - horses and humans. However, because the 
fundamental tenets of genetics apply irrespective of species, such 
illustration is equally applicable to any other species. Those of 
ordinary skill would therefore need only to directly employ the 
methods of the above invention to isolate SNPs in any other 

35 species, and to thereby conduct the genetic analysis of the present 
invention. 
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As indicated above, LOD scoring methodology has been 
developed to permit the use of RFLPs to both track the inheritance 
of genetic traits, and to construct a genetic map of a species 
(Lander, S. et aL Proc. Natl. Acad. Sci. (U.S.A.) 83 :7353-7357 
5 (1986); Lander, S. et aL . Proc. Natl. Acad. Sci. (U.S.A.) 84:2363-2367 
(1987); Donis-Keller, H. et aL . Cell 51:319-337 (1987); Lander, S. 
et ah . Genetics 121 :185-199 (1989)). Such methods can be readily 
adapted to permit their use with the polymorphisms of the present 
invention. Indeed, such polymorphisms are superior to RFLPs and 

1 0 STRs in this regard. Due to the frequency of SNPs, it is possible to 
readily generate a dense genetic map. Moreover, as indicated 
above, the polymorphisms of the present invention are more stable 
than typical (VNTR-type) RFLP polymorphisms, 

The polymorphisms of the present invention comprise direct 

1 5 genomic sequence information and can therefore be typed by a 
number of methods, in an RFLP or STR-dependent map, the analysis 
must be gel-based, and entail obtaining an electrophoretic profile 
of the DNA of the target animal. In contrast, an analysis of the 
polymorphisms (SNPs) of the present invention may be performed 

20 using spectrophotometric methods, and can readily be automated to 
facilitate the analysis of large numbers of target animals. 

Having now generally described the invention, the same will 
be more readily understood through reference to the following 
examples of the isolation and analysis of equine polymorphisms 

25 which are provided by way of illustration, and are not intended to 
be limiting of the present invention, 
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EXAMPLE 1 

DISCOVERY OF EQUINE POLYMORPHISMS 



As an initial step in the identification of equine 
polymorphisms, small shotgun libraries were prepared from 
genomic DNA isolated from peripheral blood leukocytes which had 
been purified on a Ficoll-hypaque density gradient from the blood 
35 of a single, 15 year old thoroughbred gelding (John Henry). This 
DNA was simultaneously digested to completion with Bam HI and 
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Pst i and either used directly or after size fractionation on agarose 
gels. 

Vector pLT14 (a variant of the Stratagene piasmid pKSM13(- 
)) was digested with Bam HI and Pst I and linearized DNA was 
5 purified from an' agarose gel. For both vector and size-fractionated 
genomic DNA, agarose plugs were solubilized in saturated sodium 
iodide and the DNA was subsequently immobilized on glass powder. 
After washing, the DNA was eluted with water and ethanol 
precipitated with glycogen carrier. 

1 0 Ligations with varying vector/insert ratios were effectuated 

with T4 DNA Itgase at 4°C. E. coli strain XL! was transformed with 
ligation mixtures and plated on LB agar containing 100 g/ml 
ampiciilin. Approximately 50,000 clones were generated in several 
different experiments using size fractionated or unfractionated 

1 5 insert DNA. Unplated transformed cells were stored at -70°C in 7% 
DMSO, Colonies were streaked for isolation and small scale 
piasmid preparations were performed to determine the size of 
inserted equine DNA. Larger scale preparations were performed 
with Qiagen chromatography. 

20 The sequence of the first 200-300 nucleotides of the genomic 

insert was determined by the chain terminating dideoxynucleoside 
method with T7 DNA polymerase from primers complementary to 
piasmid sequences. This information was used to design synthetic 
oligonucleotide primers complementary to the equine sequence to 

25 be employed in PGR reactions. 

In most cases, two sets of PCR primers (generally 25-mers) 
were synthesized. The first set was used to amplify, under a 
standardized set of conditions, from genomic DNA. The products of 
these reactions were diluted and used as template DNA in a second 

30 PCR using nested primers slightly internal to the original set. The 
products of these two reactions were compared to those obtained 
using the original piasmid DNA as template. In most cases, it was 
possible to obtain high quality, single-species products using this 
procedure with no attempt to optimize reaction conditions for any 

35 particular pair of primers. 



- 47 - 



Two different methods were used to screen amplified DNA 
from horses for polymorphic sequences, initially, PCR fragments 
from a pane! of 6 horses were digested with a panel of restriction 
endonucieases having 4 base recognition sites. The products of 
5 these reactions were analyzed by acrylamide gel electrophoresis on 
5% - 7.5% non-denaturing geis. Digestion products which showed 
variability when hybridized to different members of the panel were 
subjected to DNA sequence analysis. Later, DNA sequencing was 
used directly to screen for polymorphic sites. The PCR fragments 

1 0 from five unrelated horses were electroeluted from acrylamide 
gels and sequenced using repetitive cycles of thermostable Taq 
polymerase reaction in the presence of a mixture of dNTPs and 
fluorescent ddNTPs. The products were then separated and 
analyzed using the automated DNA sequencing instrument of 

1 5 Applied Biosystems, Inc. The data was analyzed using ABI 
software. Differences between sequences of different animals 
were identified by the software and confirmed by inspecting the 
relevant portion of the chromatograms on the computer screen, 
Differences were concluded to be a DNA polymorphism only if the 

20 data was available for both strands, and/or present in more than 
one haploid example among the five horses tested. 

EXAMPLE 2 

CHARACTERIZATION OF EQUINE POLYMORPHISMS 

25 

The program of identification and characterization of 
polymorphic DNA sequences in randomly selected fragments was 
continued such that approximately 550 plasmids have been 
characterized to this level. The sequences adjacent to the cloning 
30 sites was determined for 200 of these plasmids. Inserts of these 
sequenced plasmids ranged in size from 0.25 to 3.5 kb. Using this 
sequence information, oligonucleotide primers were designed to 
enable PCR amplification of the same genomic region from 
different horses. 

35 In order to identify the nucleotides present at polymorphic 

sites, PCR fragments from 5 horses were purified from acrylamide 
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gels by electroelution and completely sequenced using Taq 
polymerase "Cycle" sequencing biochemistry and automated 
sequencing equipment Results from the 5 horses were analyzed by 
computer and visually confirmed. DNA sequence variants 
5 discovered by this method were scored only if the sequence was 
obtained on both strands and the variant sequence had been found in 
more than one hapioid example. The 18 clones of Table 1 comprise 
a subset of identified SNPs. In Table 1, the immediately 5'- 
proximal sequence, the identity of the nucleotide of the 

1 0 polymorphic site, and the immediately 3'-distai sequence of each 

SNP is presented. For each SNP, Such sequences are shown in the 
horizontal rows. The sequences of double-stranded DNA in Tabie 1 
is presented in compliance with the Sequence Listing requirements 
of the United States Patent and Trademark Office. Thus, ail 

15 sequences are presented in the same orientation (5 l ->3 1 ). The 
organization of the Table is illustrated in Figure 6 with respect to 
an illustrative SNP, clone 177-2. This SNP has a polymorphic site 
capable of having either a C or a T in one strand, and a G or A in the 
opposite strand. The S'-proximal DNA sequence that immediately 

20 precedes the polymorphic site in the C/T strand is designated as 
SEQ ID NG:1. The 3'-distal sequence that immediately follows the 
polymorphic site in the C/T strand is designated as SEQ ID NO:2. 
The S'-proximal DNA sequence that immediately precedes the 
polymorphic site in the G/A strand is designated as SEQ ID NO:3. 

2 5 The 3'-distal sequence that immediateiy follows the polymorphic 

site in the G/A strand is designated as SEQ ID NO:4. Bearing in 
mind that the sequences are written in the same orientation 
(5^3 , } ? it will be seen that the sequences of SEQ ID NO:1 and SEQ 
ID NO:4 are complimentary; similarly, the sequences of SEQ ID NO:2 
30 and SEQ ID NO:3 are complimentary. The sequences that fiank a 
particular polymorphic site are thus obtained by combining the 
proximal sequence of one row with the distal sequence also shown 
in the same row. 
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The present specification refers to the above sequences by 
their sequence ID numbers (i.e. SEQ ID NO). To facilitate such 
disclosure, algebraic notation (such as "2n+1") is employed, in 
accordance with conventional algebra. Thus, the designation "SEQ 
5 ID NO:(2n+1)" denotes SEQ ID NO:5 where n=2, and SEQ ID NO:7 where 
n=3, etc. 

EXAMPLE 3 

ALLELIC FREQUENCY ANALYSIS OF EQUINE POLYMORPHISMS IN SMALL 
1 0 POPULATION STUDIES 

Small population studies (50 - 60 animals) of these DNA 
sequence polymorphisms has been carried out on a number of these 
polymorphic sites using Genetic Bit Analysis (GBA), the preferred 
1 5 solid-phase, single nucleotide interrogation system (Goelet, P. et 
aJ. (WO 92/15712). The 7 steps of the most preferred embodiment 
is illustrated in Figure 7: 

Step 1: DNA preparation. 

Step 2: Amplification of Target Sequence. After DNA is 
20 prepared from the sample, a specific region of the sample genome 
(locus) is amplified using the PCR. One of the PCR primers is 
modified with four phosphorothioate iinkages at the 5'-end. 

Step 3: Exonuciease Digestion and the Generation of Single- 
Stranded Template. The PCR product is digested with exonuciease, 
25 leaving the phosphorothioated strand intact. 

Step 4: Hybridization to Capture the Amplified Template. The 
template strand is next hybridized to the appropriate GBA primer 
that is immobilized on the surface of a microtiter well. 

Step 5: Singie Base Extension with Polymerase. DNA 
30 polymerase and haptenated ddNTPs are used to extend the GBA 
primer by one base in a template-dependent manner. 

Step 6: Colorimetric detection of the Extension Product. 
After the template is washed away using NaOH, the haptenated base 
is detected using an anti-hapten conjugate and the appropriate 
35 colorimetric substrate. 
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Step 6: Computer-Assisted Interpretation of Genotype. The 
coforimetric data from a number of loci is converted to an SNP 
genotype for the particular individual tested. 

The method is preferably conducted in the following manner: 

5 

GBA Template Preparation. 

Amplification of genomic sequences was performed using the 
polymerase chain reaction (PCR). In a first step, one hundred 
nanograms of genomic DNA was used in a reaction mixture 
10 containing each first round primer at a concentration of 2 M and 10 
mM Tris pH 8.3, 50 mM KCI, 1.5 mM MgCi 2 , 0.01% gelatin; and 0.05 
units per I Taq DNA Polymerase (AmpliTaq, Perkin Elmer). 

To obtain single-stranded template for use with solid-phase 
immobilized primer, either of two methods may be used. First, the 
p 1 5 amplification may be mediated using primers that contain 4 
£ posphorothioate-nucleotide derivatives, as taught by Nikiforov, T. 

2 (U.S, patent application serial no. 08/005,061). Alternatively, a 

0} second round of PCR may be performed using "asymmetric" primer 

Jfi concentrations. The products of the first reaction are diluted 

W 20 1/1000 in a second reaction. One of the second round primers is 
JU used at the standard concentration of 2 M while the other is used at 

111 0.08 M, Under these conditions, single stranded molecules are 

h 1 synthesized during the reaction. 

M 2 5 Solid phase immobilization of nucleic acids. 

For the GBA procedure, solid-phase attachment of the 
template-primer complex simplifies washes, buffer exchanges, 
etc., and in principle this attachment can be either via the template 
or the primer. In practice, however, especially when non gel-based 

30 detection methods are employed, attachment via the primer is 
preferable. This format allows the use of stringent washes (e.g., 
0.2 N NaOH) to remove impurities and reaction side products while 
retaining the haptenated dideoxynucleotide covaiently linked to the 
3'-end of the primer. 

35 Therefore, for GBA reactions in 96-weli plates (Nunc Nunclon 

plates, Roskiide, Denmark), the GBA primer was covaiently coupled 
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to the plate. This was accomplished by incubating 10 pmoies of 
primer having a 5'-arnino group per well in 50 of 3 mM sodium 
phosphate buffer, pH 6, 20 mM 1-ethyi-3-(3-dimethyiaminopropyi)~ 
carbodiimide (EDC) overnight at room temperature. After coupling, 
5 the plate was washed three times with TNTw. 

GSA in Microwell Plates. 

Hybridization of single-stranded DNA to primers covalentiy 
coupled to 96-wefl plates was accomplished by adding an equal 

1 0 volume of 3 M NaCI, 20 mM EDTA to the single-stranded PGR 

product and incubating each well with 20 I of this mixture at 20°C 
for 30 minutes, The plate was subsequently washed three times 
with TNTw. Twenty I of polymerase extension mix containing 
ddNTPs {3 M each, one of which was biotinylated, 5 mM DTT, 7.5 mM 

1 5 sodium isocitrate, 5 mM MnCI 2 , 0.04 units per I of Klenow DNA 
polymerase and incubated for 5 minutes at room temperature. 

Following the extension reaction, the plate was washed once 
with TNTw. Template strands were removed by incubating wells 
with 50 [i\ of 0.2 N NaOH for 5 minutes at room temperature, then 

20 washing the well with another 50 pJ of 0.2 N NaOH. The plate was 
then washed three times with TNTw. Incorporation of biotinylated 
ddNTPs was measured by an enzyme-linked assay. Each well was 
incubated with 20 \i\ of streptavidin-conjugated horseradish 
peroxidase (1/1000 dilution in TNTw of product purchased from 

2 5 BRL, Gaithersburg, MD) with agitation for 30 minutes at room 

temperature. After washing 5 times with TNTw, 100 jii of o- 
phenylenediamine (OPD, 1 mg/ml in 0.1 M citric acid, pH 4.5) (BRL) 
containing 0.012% H 2 0 2 was added to each well. The amount of 
bound enzyme was determined kinetically with a Molecular Devices 

3 0 mode! "Vmax 11 96-well spectrophotometer, Figures 8A and 8B 

illustrate how horse parentage data appears at the microtiter plate 
level, in standard horse parentage testing, samples are arrayed 85 
to a plate (columns 1-11) plus controls (column 12). For each 
horse locus the presence of the two known alleles is determined by 
35 base specific interrogation on separate plates. The two plates 
shown in figures 8A and 8B are identical in PGR template and G3A 
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primer and differ only in the biotinylated ddNTP that was used in 
the extension reaction (biotin-ddCTP in Figure 8A and biotin-ddTTP 
in Figure 8B). Upon addition of the coiorimetric reagent (OPD), the 
absorbance of the resultant color was measured in a Molecular 
5 Devices microtiter plate reader and the raw data generated in 
miiliOD/min per well. The two raw data gray scale representations 
of the absorbance data for these plates are shown in the figures 
arranged in the exact same order as on the microtiter plates. Gray 
scale intensity correlates directly with color production. At this 

1 0 bialtelic locus the bases detected are C (Figure 8A) and T (Figure 
8B). Approximately 40% of horses tested to date are heterozygotes 
(the sample in well A1, for example) and the remaining homozygous 
for C (A2, for example) or T (B3, for example). Synthetic template 
controls include a control C homozygote (well E12), a control T 

15 homozygots (well F12) and a control heterozygote (well G12). 
Scale refers to miiliOD/min at 450 nm. Most positive samples had 
signals above 100 in this case. In this format, for a 28 bialielic 
marker panel horse parentage test, 56 such plates would be 
required for complete typing of the 85 horses. 

20 Fifty-one random, unrelated horses and three sire/dam/foal 

families were chosen for study in order to establish that a 
reasonable subset of the group of DNA markers found to date was 
likely to provide the desired p(exc) > 0.90, and to assess the power 
of the DNA markers thereby allowing them to be prioritized for 

25 definitive atleiic frequency measurements. 

PGR generated singfe-stranded template DNA was prepared 
from the genomic DNA of each animal. This material was typed 
with respect to nucleotide variants using GBA. The genotype data 
obtained for each polymorphic site is summarized in Table 2. From 

30 this genotype data, allelic frequencies were determined and used to 
calculate the p(exc) of each site. The cumulative p(exc) is given 
for the group of 18 sites listed in Tables 1 and 2 is 0.955 for the 
group. In Tables 2-5, the genotype is indicated as either 
homozygote (i.e. PP or QQ) or the heterozygote (PQ). The numbers in 

35 parentheses denote the number of alleles of the genotype observed. 
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EXAMPLE 4 

PARENTAGE TESTING 

A family consisting of a sire, dam and offspring was typed 
5 with respect to the 18 variable sites discussed above with no 
exclusions found. This family had not been previously biood typed. 
Using the preliminary allelic frequency numbers given in Table 2, it 
is possible to construct a p(exc) table pertaining to this specific 
case (Table 3), In general, this Table is constructed assuming that 
1 0 the identity of the dam is not in question (although in practice, it 
is possible to exclude the mare if neither of her alleles is inherited 
by the foal). Table 3 shows the typing data for the foal and its dam 
with the sites tested listed in order of informativeness in this 
case. The overall cum p(exc) using 18 loci was 0.942. 
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0.059 


0.941 


177-1 


CC 


CC 


AA 


0.018 


0.982 


0.058 


0.942 


459-2 


CC 


CC 


GG 


0.003 


0.997 


0.058 


0.942 


007-1 


CG 


CG 




0.000 


1.000 


0.058 


0.942 


007-2 


AG 


AG 




0.000 


1.000 


0.058 


0.942 


177-2 


CT 


CT 




0.000 


1.000 


0.058 


0.942 


177-3J 


AG 


AG 




0.000 


1.000 


0.058 


0.942 
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EXAMPLE 5 

IDENTITY TESTING 

It is of interest to make use of the population analysis group 
5 to derive preliminary information concerning other aspects of the 
marker panel. For example, using the allelic frequency data, it is 
possible to calculate a probability of identity [p(lD)] value for the 
18 sites which is equal to 4.79 x 10" 7 or approximately 1 in 2.1 
million. Thus, one would predict that none of the horses examined 

1 0 in the population group would have the same genotype and computer 
analysis of the genotype database revealed this to be the case. As 
shown in Table 4, the p(!D) reaches very small numbers with 
analysis of comparatively few loci. Using the top seven sites, the 
probability of two random animals having different genotypes is 

1 5 already 99.9%. 



TABLE 4 


LOCUS 


GENOTYPE 

1 

PP (#) 


GENOTYPE 
2 

PQ {#) 


GENOTYPE 
3 

QQ <#> 


P 


q 


PUD) 


cum 
P(ID) 


177-2 


CC (18) 


CT (23) 


TT (18) 


0.500 


0.500 


0.375 


0.375 


595-3 


A A (14) 


AG (28) 


GG (11) 


0.528 


0.472 


0.376 


0.141 


090-2 


AA (13) 


AG (28) 


GG (17) 


0.466 


0.534 


0.376 


0.053 


324-1 


CC (11) 


CT (30) 


TT (19) 


0.433 


0.567 


0.380 


0.020 


129-1 


AA ( 7) 


AT (33) 


TT (20) 


0.392 


0.608 


0.388 


0.008 


007-1 


AA (22) 


CG (29) 


GG ( 9) 


0.608 


0.392 


0.38S 


0.003 


324-2 


CC (21) 


CT (24) 


TT ( 9) 


0.611 


0.389 


0.388 


0.001 


177-3 


AA (26) 


AG (25) 


GG ( 9) 


0.642 


0.358 


0.397 


4.67xl0' 4 


595-1 


AA (25) 


AG (21) 


GG (5) 


0.696 


0.304 


0.422 


L97xl0' 4 


007-3 


AA (27) 


AG (32) 


GG ( 1) 


0.717 


0.283 


0.435 


1 8.57xl0' 4 


459-1 


AA ( 5) 


AC (22) 


CC (31) 


0.276 


0.724 


0.440 


3.77xl0- 5 


085-1 


CC (32) 


CG (24) 


GG ( 4) 


0.733 


0.267 


0.447 


1.68x10" 5 


007-2 


AA (3) 


AG (25) 


GG (31) 


0.263 


0.737 


0.450 


7.58X10" 6 


474-1 


AA (35) 


AT (21) 


TT ( 4) 


0.758 


0.242 


0.468 


3.55xl0* 6 


178-1 


AA (38) 


AG (16) 


GG ( 4) 


0.793 


0.207 


0.505 


1.79xl0- 6 


595-2 


GG (34) 


GT (13) 


TT ( 3) 


0.810 


0.190 


0,527 


9.45x1 0' 7 


177-1 


AA ( 2) 


AC (12) 


CC (46) 


0.133 


0.867 


0.618 


5.84X10 -7 


459-2 


CC (53) 


CG ■ ( 6) 


GG ( 0) 


0.949 


0.051 


0.821 


4.79xl0" 7 
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False Report Rate 

In the current study, two types of potential false reports can 
be encountered due to either (1) PCR failures or 
(2) incompatibility between the genotype obtained on opposite 
5 strands. Only data from those animals which had been successfully 
typed in both strands was included in the allelic frequency 
calculations. Sixty horses typed with respect to 18 sites amounts 
to 1,080 genotypings. 95% of all typing experiments were 
successful overall. No typing errors were due to traditional PCR 

1 0 failures. 3.8% false reports were encountered at the GBA step 
either because the PCR was unsuccessful at the single strand step 
or due to operator error. 1,1% of all typings produced incompatible 
data between the strands for unknown reasons. 

In sum, the GBA (genetic bit analysis) method is thus a 

1 5 simple, convenient, and automatable method for interrogating SNPs. 
In this method, sequence-specific annealing to a solid phase-bound 
primer is used to select a unique polymorphic site in a nucleic acid 
sample, and interrogation of this site is via a highly accurate DNA 
polymerase reaction using a set of novel non-radioactive 

20 dideoxynucleotide analogs. One of the most attractive features of 
the GBA approach is that, because the actual allelic discrimination 
is carried out by the DNA polymerase, one set of reaction 
conditions can be used to interrogate many different polymorphic 
loci. This feature permits cost reductions in complex DNA tests by 

25 exploitation of parallel formats and provides for rapid development 
of new tests. 

The intrinsic error rate of the GBA procedure in its present 
format is believed to be low; the signal-to-noise ratio in terms of 
correct vs. incorrect nucleotide incorporation for homozygotes 

30 appears to be approximately 20:1. GBA is thus sufficiently 
quantitative to allow the reliable detection of heterozygotes in 
genotyping studies. The presence in the DNA polymerase-mediated 
extension reaction of all four dideoxynucleoside triphosphates as 
the sole nucleotide substrates heightens the fidelity of genotype 

35 determinations by suppressing misincorporation. GBA can be used 
in any application where point mutation analyses are presently 
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employed -- including genetic mapping and linkage studies, genetic 
diagnoses, and identity/paternity testing -- assuming that the 
surrounding DNA sequence is known. 

EXAMPLE 6 

ANALYSIS OF A HUMAN SNP 

Human single nucleotide polymorphisms may be used in the 
same manner as the above-described equine polymorphisms. 
Examples of suitable human polymorphisms are presented in Table 
5. 
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For the purpose of vaiidating the strategy of converting 
human SNPs to a GBA test format, a phenotypically neutral SNP site 
was converted and tested by GBA. This site was selected from the 
Johns Hopkins University OMB database of human polymorphisms. 
5 The site is met-H on chromosome 7 at q31, mutation position 127, 
A to G (Horn, G.T. et a/., Clin. Chem. 36, 1614-1619, 1990). The 
following oligonucleotides were synthesized (p=phosphorothioate): 

PCR primer no. 1552 (SEQ ID NO:93) 
1 0 5-CpApTpCpCATGTAGGAGAGCCTTAGTC 

PCR primer no. 1553 (SEQ ID NO:94) 

S'-CCATTTTTGTGTCTTCTAGTCTAAGG 

1 5 GBA primer no. 1554 (SEQ ID NO:95) 

5'-TTGAAAGATCGTCAGAAAAATCC 

Human DNA samples were randomly selected from the DNA 
archives of two families available from the Centre D'Etude du 

20 Polymorphisme Humaine (CEPH) family collection. A negative 
control, containing no DNA was also used. Sample DNAs were 
amplified by PCR using the above primers and the resulting product 
was analyzed by GBA for two potential bases at the polymorphic 
site, G and A. GBA results were obtained by an endpoint reading of 

25 absorbance at 450 nm in a microtiter plate reader. The data is 
presented in Table 6. 

Samples 1, 2, 4, 6 and 8 were homozygous for A, samples 7 and 
9 were homozygous for G and samples 3 and 5 were GA 
heterozygotes. These DNAs have not been tested for this biallefism 

30 by any other method to date. 
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TABLE 6 


No. 


CEPH DNA 
No. 


Adsorption at 

A450 
Base Base 
G A 


Qpnntvnp 


1 


1333-10 


.100 


.556 


AA 


2 


1333-02 


.084 


.782 


AA 


3 


1333-04 


.372 


.369 


GA 


4 


1333-05 


.081 


.905 


AA 


5 


1333-07 


.321 


.346 


GA 


6 


1333-08 


.084 


.803 


AA 


7 


1340-09 


.675 


.092 


GG 


8 


1340-10 


.084 


.756 


AA 


9 


1340-12 


.537 


.096 


GG 


No DNA 


N/A 


.076 


.097 


N/A 



False Report Rate 

In the current study, two types of potential false reports can 
5 be encountered due to either (1) PCR failures or 
(2) incompatibility between the genotype obtained on opposite 
strands. Only data from those animals which had been successfully 
typed in both strands was included in the allelic frequency 
^ calculations. Sixty horses typed with respect to 18 sites amounts 
10 to 1,080 genotypings. 95% of all typing experiments were 
successful overall. No typing errors were due to traditional PCR 
failures. 3.8% false reports were encountered at the GBA step 
either because the PCR was unsuccessful at the single strand step 
or due to operator error. 1.1% of all typings produced incompatible 
1 5 data between the strands for unknown reasons. 

In sum, the GBA (genetic bit analysis) method is a simple, 
convenient, and automatable method for interrogating SNPs. In this 
method, sequence-specific annealing to a solid phase-bound primer 
is used to select a unique polymorphic site in a nucleic acid 
20 sample, and interrogation of this site is via a highly accurate DNA 
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polymerase reaction using a set of novel non-radioactive 
dideoxynucieotide analogs. One of the most attractive features of 
the GBA approach is that, because the actual allelic discrimination 
is carried out by the DNA polymerase, one set of reaction 
5 conditions can be used to interrogate many different polymorphic 
loci. This feature permits cost reductions in complex DNA tests by 
exploitation of parallel formats and provides for rapid development 
of new tests. 

The intrinsic error rate of the GBA procedure in its present 

1 0 format is believed to be low; the signal-to-noise ratio in terms of 
correct vs. incorrect nucleotide incorporation for homozygotes 
appears to be approximately 20:1. GBA is thus sufficiently 
quantitative to allow the reliable detection of heterozygotes in 
genotyping studies. The presence in the DNA polymerase-mediated 

1 5 extension reaction of all four dideoxynucieoside triphosphates as 
the sole nucleotide substrates heightens the fidelity of genotype 
determinations by suppressing misincorporation. GBA can be used 
in any application where point mutation analyses are presently 
employed including genetic mapping and linkage studies, genetic 

20 diagnoses, and identity/paternity testing -- assuming that the 
local surrounding DNA sequence is known. 

While the invention has been described in connection with 
specific embodiments thereof, it will be understood that it is 
capable of further modifications and this application is intended to 

25 cover any variations, uses, or adaptations of the invention 
following, in general, the principles of the invention and including 
such departures from the present disclosure as come within known 
or customary practice within the art to which the invention 
pertains and as may be applied to the essential features 

30 hereinbefore set forth and as follows in the scope of the appended 
claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

5 

(i) APPLICANT; GOELET, PHfLiF 
KNAPP, MICHAEL FL 

(ii) TITLE OF INVENTION: SINGLE NUCLEOTIDE POLYMORPHISMS AND 
1 0 THEIR USE IN GENETIC ANALYSIS 

(iii) NUMBER OF SEQUENCES: 95 

(iv) CORRESPONDENCE ADDRESS: 

1 5 (A) ADDRESSEE: HOWREY & SIMON 

(B) STREET: 1299 PENNSYLVANIA AVENUE, N.W. 

(C) CITY: WASHINGTON 

(D) STATE: D.C. 
(E) COUNTRY: US 

20 (F) ZIP: 20004 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

2 5 (C) OPERATING SYSTEM; PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: US 

3 0 (B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: AUERBACH, JEFFREY l 

3 5 (B) REGISTRATION NUMBER: 32,680 

(C) REFERENCE/DOCKET NUMBER: 683-1 04-CiP 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (202) 383-7451 

4 0 <B) TELEFAX: (202) 383-6610 

(2) INFORMATION FOR SEQ ID NO:1: 

(i) SEQUENCE CHARACTERISTICS: 

4 5 (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 0 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 
5 (B) CLONE: 177-2 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:1: 

GCAGCTCTAA GTGCTGTGGG 20 

10 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
1 5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-2 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID N02: 
TGCAGAAATT CTAAGGTGTT 20 

3 5 (2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

4 0 (C) STRANDEDNESS: singie 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

4 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

5 0 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-2 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
AACACCTTAG AATTTCTGCA 20 
5 (2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

1 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 5 (Hi) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

2 0 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-2 

2 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

CCCACAGCAC TTAGAGCTGC 20 
(2) INFORMATION FOR SEQ ID NO:5: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

3 5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

40 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

45 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

50 

AGCTCTGGGA TGATCCACTA 20 
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(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: singie 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vll) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
TGAGGGAAAA ATGATGATGC 20 

2 5 (2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

3 0 (C) STRANDEDNESS: singie 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

3 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

4 0 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

4 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 



GCATCATCAT TTTTCCCTCA 



20 



(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

{vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
TAGTGGATCA TCCCAGAGCT 
(2) INFORMATION FOR SEQ iD NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
AAAACTAATT TGATGGCCAT 
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(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
AAAGTCAGAA CAATGATTGC 20 

25 

(2) INFORMATION FOR SEQ ID NO:1 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1 1: 
GCAATCATTG TTCTGACTTT 



20 



(2) INFORMATION FOR SEQ ID NO:12: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(tv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 
ATGGCCATCA AATTAGTTTT 
(2) INFORMATION FOR SEQ ID NO:13: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
CACAAGGCCC AAGAACAGGA 
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(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(III) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE; 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
TGAGTTCAGC GAGTGTCAGA 20 

25 

(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
TCTGACACTC GCTGAACTCA 



20 
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(2) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Mi) HYPOTHETICAL; NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabaiius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
TCCTGTTCTT GGGCCTTGTG 20 
(2) INFORMATION FOR SEQ ID NO:17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabaiius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 



25 



30 



35 



40 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 
TGGGAAAGAC CACATTATTT 



20 



(2) INFORMATION FOR SEQ ID NO:18: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Equus cabaiius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
GTTCCCTTTT GTTTCAGACC 
(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(lit) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabaiius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: 
GGTCTGAAAC AAAAGGGAAC 
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(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
AAATAATGTG GTCTTTCCCA 20 

25 

(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 



CATGAGTAAG AAGCATCCGG 



20 



(2) INFORMATION FOR SEQ ID NO:22: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

0 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CCATGGAGTC ATAGATAAGT 
(2) INFORMATION FOR SEQ ID NO:23: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
ACTTATCTAT GACTCCATGG 



- 75 - 



(2) INFORMATION FOR SEQ ID NO:24; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
CCGGATGCTT CTTACTCATG 20 
(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: finear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 



25 



30 



35 



40 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CCCAAGAACA GGATTGAGTT 



20 
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(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
AGCGAGTGTC AGAGTTGTGT 20 

25 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
ACACAACTCT GACACTCGCT 



20 
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(2) INFORMATION FOR SEQ ID NO:28: 

(i> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 bass pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(Hi) HYPOTHETICAL NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
AACTCAATCC TGTTCTTGGG 20 
(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(H) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



m 25 

4:- 



Z 30 
S3 35 
40 



45 



AGCAAGAAA TGGGGGGCCTT 



20 
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(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(Hi) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
GTCCTACAAT TGCCAGGAAG 
(2) INFORMATION FOR SEQ ID NO:31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 
45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
CTTCCTGGCA ATTGTAGGAC 



10 



15 



20 
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(2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(Iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
AAGGCCCCCC ATTTCTTGCT 20 
(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(Hi) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 



i 25 

SSVf, 

II 

hi 30 

a „„,il 

5 «:S 

O 35 
40 



45 



GAATATCAAT ATATATATAT 



20 
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(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
D (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

1 Q (ii) MOLECULE TYPE: DNA (genomic) 

(Hi) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

- (A) ORGANISM: Equus cabalius 



20 



45 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 595-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 



TGTGTGTGTG TGTATTTGCT 
2 5 20 

(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS' 
(A) LENGTH: 20 base pairs 
d 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (H) MOLECULE TYPE: DNA (genomic) 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 595-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
AGCAAATACA CACACACACA 



20 
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(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

15 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
ATATATATAT ATTGATATTC 20 

2 5 (2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

3 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

3 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

4 0 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 



GCCATAATTA AGCCTGTATT 



20 
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(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi - ) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

20 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO:38: 
GTTTGTTTTA AATTTTGTGA 20 
(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(no HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 



25 



30 



35 



40 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
TCACAAAATT TAAAACAAAC 



20 
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(2) INFORMATION FOR SEQ ID NO:40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cafaallus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
AATACAGGCT TAATTATGGC 20 

25 

(2) INFORMATION FOR SEQ ID NO:41: 

(i> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41 : 
GTGTAGAGTA GTTCAAGGAC 



20 
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(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
<iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
ATGTCTTATA CCTCCCTTTT 20 

25 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

40 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
AAAAGGGAGG TATAAGACAT 



20 
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(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabaiius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
GTCCTTGAAC TACTCTACAC 20 

25 

(2) INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(tv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabaiius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 



GTGAACGGAG AGCAGGCCTT 



20 
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(2) INFORMATION FOR SEQ ID NO:46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iri) HYPOTHETICAL: NO 
(iv) ANTI-SENSE; NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabaflus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46; 
CCTGCTGAAG CCTCAGACCG 20 

25 

(2) INFORMATION FOR SEQ iDMO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabalius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 



CGGTCTGAGG CTTCAGCAGG 



20 
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(2) INFORMATION FOR SEQ ID NO:48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucieic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(III) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 
AAGGCCTGCT CTCCGTTCAC 20 

25 

(2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
3 0 (B) TYPE: nucieic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(ill) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 
CTGCTCTTTA GACTATGACC 



20 
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(2) INFORMATION FOR SEQ ID NO:50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 
TCAACCTTGC ATCATGAGCT 20 

25 

(2) INFORMATION FOR SEQ ID NO:51 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

40 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 



AGCTCATGAT GCAAGGTTGA 



20 



(2) INFORMATION FOR SEQ ID NO:52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabatlus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
GGTCATAGTC TAAAGAGCAG 
(2) INFORMATION FOR SEQ ID NO:53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 
TTTGAGCTGG GACCTCAGTC 
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(2) INFORMATION FOR SEQ ID NO:54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
<iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 
TCTCCTGCCT TTAGACTCGA 20 
(2) INFORMATION FOR SEQ ID NO:55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 



25 



30 



35 



40 



45 



TCGAGTCTAA AGGCAGGAGA 



20 
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(2) INFORMATION FOR SEQ ID NO:56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 



10 



20 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 474-1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 



25 GACTGAGGTC CCAGCTCAAA 2Q 
(2) INFORMATION FOR SEQ ID NO:57: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
^ u (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTi-SENSE: NO 

40 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 



45 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 178-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 
GAACCTCTGG GCCGTGGATA 



20 
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(2) INFORMATION FOR SEQ ID NO:58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
TTGTTCAGAA GCACAGGTGA 20 

25 

(2) INFORMATION FOR SEQ ID NO:59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 
TCACCTGTGC TTCTGAACAA 



20 
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(2) INFORMATION FOR SEQ ID NO:60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 
TATCCACGGC CCAGAGGTTC 20 

25 

(2) INFORMATION FOR SEQ ID NO:61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-2 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: 



GTATTTGCTA GCTCTGGGAT 



20 
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(2) INFORMATION FOR SEQ ID NO:62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 



1 0 



20 



25 



35 



45 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 595-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 
ATCCACTAAT GAGGGAAAAA 20 
(2) INFORMATION FOR SEQ ID NO:63: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 595-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 



I I I I IC CCTC ATTAGTGGAT 



20 



(2) INFORMATION FOR SEQ ID NO:64: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: 
ATCCCAGAGC TAGCAAATAC 
(2) INFORMATION FOR SEQ ID NO:65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 
GAAGTTGTGG GACAGATGTG 
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(2) INFORMATION FOR SEQ ID NO:66: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(B) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 



L. 



(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 



20 



25 



35 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 177-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: 
AGAGATGCAG CTCTAAGTGC 20 
(2) INFORMATION FOR SEQ ID NO:67: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(ili) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 



45 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 177-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: 
GCACTTAGAG CTGCATCTCT 



20 
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(2) INFORMATION FOR SEQ ID NO:68: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 
CACATCTGTC CCACAACTTC 
(2) INFORMATION FOR SEQ ID NO:69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-2 



25 



30 



35 



40 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 
CCATGAGGAA GCCTCCACAA 



20 
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(2) INFORMATION FOR SEQ ID NO:70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 



1 0 



20 



n 35 



45 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 459-2 



Q (xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 

HI GTCCCAATAG TCTGGGATTC 20 

iff! (2) INFORMATION FOR SEQ ID NO:71 : 

y! (i) SEQUENCE CHARACTERISTICS: 

kj (A) LENGTH: 20 base pairs 

I 3 0 (B) TYPE: nucleic acid 

O (C) STRANDEDNESS: single 

yS (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL NO 

(iv) ANTI-SENSE: NO 

40 ' (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: 459-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: 
GAATCCCAGA CTATTGGGAC 20 



(2) INFORMATION FOR SEQ ID NO:72: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (genomic) 
(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
TTGTGGAGGC TTCCTCATGG 
(2) INFORMATION FOR SEQ ID NO:73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 
AAAGCAGACT ACGAGAAACA CAAA 
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(2) INFORMATION FOR SEQ ID NO:74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 
TCTACGCCTG CGAAGTCACC CATC 



(2) INFORMATION FOR SEQ ID NO:75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:75: 
5 0 GATGGGTGAC TTCGCAGGCG TAGA 



25 



30 



35 



40 



45 
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(2) INFORMATION FOR SEQ ID NO:76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: 
TTTGTGTTTC TCGTAGTCTG CTTT 24 

25 

(2) INFORMATION FOR SEQ ID NO:77: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: ILIB 2q3-q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: 
5 0 CTCCTGCAAT TGACAGAGAG CTCC 



24 
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(2) INFORMATION FOR SEQ ID NO:78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ILIB 2q3-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 
GAGGCAGAGA ACAGCACCCA AGGT 



(2) INFORMATION FOR SEQ ID NO:79: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: ILIB 2q3-q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 
5 0 ACCTTGGGTG CTGTTCTCTG CCTC 



24 
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m 



(2) INFORMATION FOR SEQ ID NO:80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(III) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



10 



20 



25 



40 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: ILIB 2q3-q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: 
GGAGCTCTCT GTCAATTGCA GGAG 24 

(2) INFORMATION FOR SEQ ID NO:81: 



(i) SEQUENCE CHARACTERISTICS: 

m 30 (A) LENGTH: 24 base pairs 

^ (B) TYPE: nucleic acid 

\± (C) STRANDEDNESS: single 

J| (D) TOPOLOGY: linear 

2 35 (ii) MOLECULE TYPE: DNA (genomic) 

M= (iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(vii) IMMEDIATE SOURCE: 
45 (B) CLONE: LDLR 19p1 3.3 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: 
CTCCATCTCA AGCATCGATG TCAA 



24 
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(2) INFORMATION FOR SEQ ID NO:82: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

^ (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: LDLR 19p13.3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: 
GGGGGCAACC GGAAGACCAT CTTG 



(2) INFORMATION FOR SEQ ID NO:83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: singie 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: LDLR 19p13.3 





(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 
5 0 CAAGATGGTC TTCCGGTTGC CCCC 



24 
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(2) INFORMATION FOR SEQ ID NO:84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: LDLR 19p13.3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: 
TTGACATCGA TGCTTGAGAT GGAG 



(2) INFORMATION FOR SEQ ID NO:85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO ' 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:85: 
GTTTGGTCTA AGTTGCTGAT TACC 



- 1 

(2) INFORMATION FOR SEQ ID NO:86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 



(3d) SEQUENCE DESCRIPTION: SEQ ID NO:86: 
GGATTTTTCT GACGATCTTT CAAC 



(2) INFORMATION FOR SEQ ID NO:87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: 
GTTGAAAGAT CGTCAGAAAA ATCC 
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(2) INFORMATION FOR SEQ ID NO:88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STR AN DEDN ESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: 
GGTAATCAGC AACTTAGACC AAAC 



(2) INFORMATION FOR SEQ ID NO:89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PROC 2q13-q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: 
GCTGACAGCG GCCCACTGCA TGGA 



- 108 - 



(2) INFORMATION FOR SEQ ID NO:90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucieic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

15 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PROC 2q13-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:90: 
GAGTCCAAGA AGCTCCTTGT CAGG 24 

25 

(2) INFORMATION FOR SEQ ID NO:91: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucieic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: PROC 2q1 3-q21 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: 
CCTGACAAGG AGCTTCTTGG ACTC 



24 
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(2) INFORMATION FOR SEQ ID NO:92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PROC 2q13-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: 
TCCATGCAGT GGGCCGCTGT CAGC 

2 5 (2) INFORMATION FOR SEQ ID NO:93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

3 0 (C) STRANDEDNESS: single 

<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

3 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

4 0 (A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: MET-H 7q31 

4 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: 



CATCCATGTA GGAGAGCCTT AGTC 



24 



(2) INFORMATION FOR SEQ ID NO:94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:94: 
CCATTTTTGT GTCTTCTAGT CTAAGG 
(2) INFORMATION FOR SEQ ID NO:95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:95: 
TTGAAAGATC GTCAGAAAAA TCC 



